wrong potential energy for atoms with eam/alloy/cuda

Hello people,

The issue of wrong potential energies written to dump files with eam/alloy/cuda has gone from being just my personal problem to a suspected bug. Another cuda user on the list here, James Almeida, ran a test for me and got the same errors: trajectories of atoms are all calculated exactly the same as on cpus, but the potential energies of individual atoms written in the dump file are ridiculous, around -5000 eV.

If anyone else wants to look at it to confirm or undo the idea that there might be a bug, the files are attached.

The run is a small, simple one, create a block of W crystall of 6 million atoms, anneal that 200 steps. It writes dump files (given that it's 6M atoms, they get a bit large). These all seem fine, except the column with potential energy for each atom when calculated with cuda. With cpus, these are well behaved, potential energies after 200 steps around -8 eV, as they should be:

ITEM: TIMESTEP
200
ITEM: NUMBER OF ATOMS
6000000
ITEM: BOX BOUNDS pp pp pp
0 317
0 951
0 317
ITEM: ATOMS id type x y z vx vy vz fx fy fz c_ke c_pe
1 1 0.0806297 950.941 316.871 -2.19527 1.22234 -0.983909 -1.58293 2.42537 1.69776 0.0693686 -8.67786
2 1 1.42518 1.62101 1.52561 3.59783 1.72984 2.31545 1.40752 0.732235 -0.471684 0.202903 -8.77869
3 1 2.96678 0.0975968 316.899 1.51189 -1.5602 -2.52895 0.330369 0.151668 0.0605063 0.105897 -8.72994
4 1 4.50532 1.57068 1.53133 1.81658 -1.07678 -2.41491 -0.264497 3.36835 1.53948 0.0980426 -8.68915
5 1 5.9894 0.206289 316.961 -3.95444 0.946948 6.71987 3.20465 -1.30805 -1.60915 0.587719 -8.67464

With cuda, the potential energies after 200 steps can be ~-5000 eV for some atoms, which is orders of magnitude wrong:

ITEM: TIMESTEP
200
ITEM: NUMBER OF ATOMS
6000000
ITEM: BOX BOUNDS pp pp pp
0 317
0 951
0 317
ITEM: ATOMS id type x y z vx vy vz fx fy fz c_ke c_pe
1 1 0.0806297 950.941 316.871 -2.19527 1.22234 -0.983909 -1.58293 2.42537 1.69776 0.0693686 -6.40983
2 1 1.42518 1.62101 1.52561 3.59783 1.72984 2.31545 1.40752 0.732235 -0.471684 0.202903 -5389.8
3 1 2.96678 0.0975968 316.899 1.51189 -1.5602 -2.52895 0.330369 0.151668 0.0605063 0.105897 -5391.57
4 1 4.50532 1.57068 1.53133 1.81658 -1.07678 -2.41491 -0.264497 3.36835 1.53948 0.0980426 -5414.76
5 1 5.9894 0.206289 316.961 -3.95444 0.946948 6.71987 3.20465 -1.30805 -1.60915 0.587719 -6.73543

In the thermo file, the potential energy is written correctly. And the atom trajectories over 200 steps are perfectly similar. So it seems just a pesky little problem in how the potential energy in dump files is reported. But a problem that James Almeide also got with his cuda executable.

greets,
Peter

files.tar (770 KB)

peter,

please note that the developer of USER-CUDA has essentially abandoned
it and is now working on the KOKKOS package. thus it is highly
unlikely that somebody will debug it, particularly since your
reference input seem to be using a very large number of atoms. there
are some known problems with pppm/cuda for example that have been
around for quite a while without somebody wanting to look into it.

have you tried the GPU package?

axel.

Hi Axel,

In light of what you said about cuda development and bug fixing being in very low gear, I definitely will try the openCL version.

From your suggestion I take it that the openCL version is still supported?

greets,
Peter

Hi Axel,

In light of what you said about cuda development and bug fixing being in very low gear, I definitely will try the openCL version.

what has this to do with OpenCL?

Hi Axel,

In light of what you said about cuda development and bug fixing being in very low gear, I definitely will try the openCL version.

what has this to do with OpenCL?

Forgive my possiby very deep running ignorance, but the alternative to cuda to make use of gpus in lammps is through openCL-optimised code, isn't it?

Hi Axel,

In light of what you said about cuda development and bug fixing being in very low gear, I definitely will try the openCL version.

what has this to do with OpenCL?

Forgive my possiby very deep running ignorance, but the alternative to cuda to make use of gpus in lammps is through openCL-optimised code, isn't it?

i am asking you to try the GPU *package*, which works *very* well with
CUDA. now it *is* possible to also compile the GPU package with
OpenCL, since its implementation of the GPU support is done with some
very smart scripting and preprocessing that it can be compiled for
*both*. but using the GPU package by no means forces you to use
OpenCL.

please have a look at the relevant LAMMPS documentation on this subject.

axel.