Using LAMMPS on AMD GPUs

_Vladimir_Stegailov · July 7, 2018, 12:59pm

Hi Axel,

I understand that the OpenCL version of the GPU package is no longer supported by the developers.
But perhaps somebody has already met (or even solved) the problem with the memory leakage during GPU-OpenCL execution?

I consider LAMMPS (11Aug17) built with gcc 4.3.3 and MPICH version 3.0.4 with either CUDA of OpenCL version of the GPU package.
For the tests I have used the in-file from examples/melt replicated by 4x4x4 (the tests for other models shows the same problem).

If I build LAMMPS with the CUDA version of the GPU package everything works well.
However the OpenCL version demonstrates gradual memory leakage during the MD run that eventually ends with a segfault-type error when 100% of the memory is utilised.

I have tested OpenCL version with GTX1070 and FireProS9150 cards.
In both cases, there is this memory leakage issue.
So it seems to not the GPU driver problem.

What could be the reason of such and erroneous behaviour of the OpenCL version?
Any hints where I should look for this problem in the code?

Kind regards,
Vladimir

akohlmey · July 7, 2018, 11:56pm

Hi Axel,

I understand that the OpenCL version of the GPU package is no longer supported by the developers.

if you look through the git logs, you should see, that that is not the case. there have been repeatedly updates to the OpenCL compilation setup, and it is currently the only option to use GPUs on windows, and there are people using that feature.

But perhaps somebody has already met (or even solved) the problem with the memory leakage during GPU-OpenCL execution?

I consider LAMMPS (11Aug17) built with gcc 4.3.3 and MPICH version 3.0.4 with either CUDA of OpenCL version of the GPU package.
For the tests I have used the in-file from examples/melt replicated by 4x4x4 (the tests for other models shows the same problem).

please check with the latest development version. perhaps you are seeing an issue, that is already solved.

If I build LAMMPS with the CUDA version of the GPU package everything works well.
However the OpenCL version demonstrates gradual memory leakage during the MD run that eventually ends with a segfault-type error when 100% of the memory is utilised.

I have tested OpenCL version with GTX1070 and FireProS9150 cards.
In both cases, there is this memory leakage issue.
So it seems to not the GPU driver problem.

What could be the reason of such and erroneous behaviour of the OpenCL version?

impossible to say in such a general way. i suggest you try running with valgrind or some other tool that allows tracking down memory leaks for some hints at what part of the code may be causing issues.

axel.

_Vladimir_Stegailov · July 9, 2018, 8:28am

Axel, thank you for the feedback!

I have looked through the git logs for the issues concerning the gpu_package
https://github.com/lammps/lammps/issues?q=label%3Agpu_package+is%3Aclosed

Unfortunately, I have not found any notice of the memory leakage issue.

However, this problem exists even in the latest LAMMPS version (29Jun18).

LAMMPS building options:
gcc 4.3.3
IntelMPI 2017
OCL_TUNE = -DGENERIC_OCL

The test case (just a generic LJ system):
example/melt
mpirun -np 6 …/…/src/lmp_g++_mpich -sf gpu -pk gpu 1 -in in.melt
or
…/…/src/lmp_serial -sf gpu -pk gpu 1 -in in.melt

In both cases, the memory consumed by LAMMPS increases gradually (it is visible via top or free).

I can reproduce this problem on GTX1070 and FireProS9150 cards (with all other software stack except GPU-drivers and SDK being identical).
My colleagues have seen the same problem for EAM and OPLS models.

Vladimir

akohlmey · July 10, 2018, 4:44am

Axel, thank you for the feedback!

I have looked through the git logs for the issues concerning the gpu_package
https://github.com/lammps/lammps/issues?q=label%3Agpu_package+is%3Aclosed

those are not the “git logs” but reported and closed issues. most issues fixed in LAMMPS are not reported on github. …and they are not always explicitly mentioned in commit messages. i simply mentioned the git logs (i.e. in this case the output of “git log” in the lib/gpu folder of a checked out repository) to confirm, that there is still activity and updates going on.

Unfortunately, I have not found any notice of the memory leakage issue.

However, this problem exists even in the latest LAMMPS version (29Jun18).

LAMMPS building options:
gcc 4.3.3
IntelMPI 2017
OCL_TUNE = -DGENERIC_OCL

please try with a different tuned setting (e.g. kepler or fermi), that should be more suitable for GPUs. “generic” is more for CPU and vectorization.

The test case (just a generic LJ system):
example/melt
mpirun -np 6 …/…/src/lmp_g++_mpich -sf gpu -pk gpu 1 -in in.melt
or
…/…/src/lmp_serial -sf gpu -pk gpu 1 -in in.melt

In both cases, the memory consumed by LAMMPS increases gradually (it is visible via top or free).

I can reproduce this problem on GTX1070 and FireProS9150 cards (with all other software stack except GPU-drivers and SDK being identical).
My colleagues have seen the same problem for EAM and OPLS models.

FWIW, some AMD GPU hardware is apparently now also supported by KOKKOS (provided you use the latest version). you’ll have to use a newer C++ compiler for that, tho.

axel.

_Vladimir_Stegailov · July 12, 2018, 6:29am

Hi Axel!

Thank you for the information!

My colleagues have made additional tests on another system:
LAMMPS ver.29Jun18

OpenSUSE 42.3, GCC 4.8.5,
OCL_TUNE=-DFERMI_OCL,
NVidia driver 396.26

OpenMPI 1.10.6,
MAKE/Makefile.mpi

example/melt

The tests show the memory leakage of about 20 Mb per minute.

So I am 100% confident that there is a bug in OpenCL variant of the gpu_package.

We have a small cluster equipped with FirePro S9150 cards.
That is why solving this issue is quite important for us.

Please let me know how to proceed.

Is it possible to inform gpu_package developers claiming a new bug on github?
Or we should better try to find the bug ourselves?

Vladimir

P.S.

FWIW, some AMD GPU hardware is apparently now also supported by KOKKOS (provided you use the latest version).

I have seen that KOKKOS is being ported to the new AMD HIP framework.
However, we have not yet installed the open-source ROCm driver that supports HIP.
It is not clear if this ROCm driver will work on our quite old FirePro S9150 cards…

sjplimp · July 13, 2018, 3:48pm

Maybe Trung has an idea about this (CCd)

Steve

akohlmey · July 16, 2018, 1:50pm

please consider reporting a summary of this including the command
lines used on the LAMMPS github project issue tracker.
e-mails have a way of fading out of sight, while the visibility of
issues is much better and thus the chance that somebody will look into
addressing it improved.

thanks,
axel.

_Vladimir_Stegailov · July 18, 2018, 6:38pm

Just submitted the issue to the github tracker!

Thank you!

Vladimir

akohlmey · July 25, 2018, 2:52pm

Multiple fixes to running the GPU package with OpenCL and CUDA were
just merged to the github master branch.
If you don't already have clone the repo, you can download a snapshot
from https://github.com/lammps/lammps/archive/master.tar.gz and check
it out.
... or wait for the next patch.

axel.

_Vladimir_Stegailov · May 29, 2019, 9:40am

Hi Axel!

My student and collaborator Evgeny Kuznetsov has ported the CUDA-backend of the geryon library to ROCm HIP.
Everything seems to be working. The tests from the “examples” folder pass well.

The performance of this HIP-backend is better than the performance of the OpenCL-backend:
10.7 ns/day vs 8.0 ns/day on Radeon VII with an 8 core Epyc CPU and ROCm ver.2.2.

Evgeny plans to fill the corresponding pull request following these instructions https://lammps.sandia.gov/doc/Howto_github.html

Please let us know what branch should we use for the pull request: lammps-master or lammps-unstable?

Vladimir

akohlmey · May 29, 2019, 10:51am

pull requests are merged into master, unstable only gets updated from master, when a patch release is made.

before submitting a pull request for this, please wait until #1430 is merged, which will be after the upcoming stable release, i.e. within a week or two.
it may be a good idea to make a branch and merge #1430 into that and test it on with ROCm as well. that way, the effort of merging your changes will be smaller and possible merge conflicts could be avoided or reduced.

axel.