[lammps-users] Lost atoms when using GPU package

Hello Everyone,

I am trying to check the LAMMPS installation on a system that has nvidia PASCAL gpus with a nvidia-toolkit version 11.0 and nvidia-driver version 450.51

vt-gpu.cmake (1.07 KB)

vt-gpu_nolib.cmake (475 Bytes)

in.lj (518 Bytes)

Helo Everyone,

The problem of lost atoms that I was encountering with the GPU-package install reported earlier in the thread

‘“ERROR: Lost atoms: original 32000 current 26415 (src/gitLammps/src/thermo.cpp:427)
Last command: run $t”’

went away after I built and compiled LAMMPS with the ‘-DCUDPP_OPT=no’ (Note: default is ‘yes’) option using:

“cmake -C …/cmake/presets/vt-gpu.cmake -C …/cmake/presets/vt-gpu_nolib.cmake …/cmake -DPKG_GPU=on -DGPU_PREC=double -DGPU_API=cuda -DCUDPP_OPT=no”

I am not entirely sure what the effect of turning OFF the CUDA Performance Primitives optimization would be on the performance.
Also, the issue most probably appears to be associated with the versions of the device driver and the nvidia-cuda-toolkit used.

I request anyone with knowledge to please enlighten me on this…

Thanks.

Warm regards
Vaibhav.

FYI, the CUDPP option has been set to OFF by default recently. It should be OFF for recent GPU hardware as it does not offer any benefits there.
There also have been significant updates and improvements to the GPU package recently albeit with some known problems in the CUDA backend that are still being worked on. hopefully this will be resolved by the time we release the next stable version.

axel.

Hi Axel,

Thanks a lot for the confirmation and for the information on the upcoming updates to the GPU package.

“the CUDPP option has been set to OFF by default recently”

I had updated everything from the git repository (stable branch) just Thursday or Friday last week. For the build and the compile, even when I was not using the CUDPP option entirely, I was still getting the “lost atoms” error. So, from my experience it appears that the CUDPP option has not been turned OFF by default just yet, at least in the stable branch.

Additionally, if the option is turned off by default then the online documentation needs to be updated since the change is not currently reflected in it.
https://lammps.sandia.gov/doc/Build_extras.html#gpu

However, I perfectly understand that there may be some lag in the updates to be reflected in the online documentation since the changes, as you mention, are very recent.

I will look forward to the next stable release…

Thanks once again…

Warm regards,
Vaibhav.

I wasn’t talking about the “stable” branch. That is updated only 2-3 times a year.

The first set of updates are in the recent 10 March 2021 patch release which you would get when you follow the “unstable” branch.

The change I was referring to, however, is not yet included there either, but only in the development head or “master” branch. The commit can be seen here:

https://github.com/lammps/lammps/commit/d1b4af60a35b4489b9f182f3e3728073b15b2c3a

The online docs at sandia are following only patch releases, i.e. the “unstable” branch.
for the “master” branch there are online docs here: https://docs.lammps.org/Build_extras.html#gpu

in short. disabling CUDPP is the right choice for your hardware and it should work with either the stable or unstable branch.
the unstable branch has some known issues, though, that may or may not affect you (difficult to tell). the “master” branch has some additional workarounds, but the actual fixes are still pending. at the moment for “master” and “unstable” it appears that OpenCL is the backend with the fewest problems.

for the next stable release we plan to include GPU testing for multiple platforms into our regular automated testing, so you can expect that overall the stability and reliability of the GPU package will improve. just adding GPU support to our test tools has exposed several issues that happen for corner cases that are valid uses but not very common.

axel.

Perfect! makes sense…

I wasn’t even aware that there were two sets of online documentations available for lammps.

Thanks for pointing this out and for taking the time to explain the key differences between the three branches.

Warm regards,
Vaibhav.

Perfect! makes sense…

I wasn’t even aware that there were two sets of online documentations available for lammps.

docs.lammps.org is an experiment. it is updated automatically whenever new changes are merged into the development head.
the documentation pages at sandia are updated manually. a major issue we have with the pages at sandia is that the sandia network bandwidth for external access is significantly throttled (download of snapshots from github are much faster than from the sandia download page), while the docs.lammps.org pages are hosted at Temple University in Philadelphia without bandwidth limitations and thus load much smoother.

Thanks for pointing this out and for taking the time to explain the key differences between the three branches.

this is also explained in the LAMMPS manual. https://docs.lammps.org/Install_git.html

axel.