Weird results with the GPU package

Hello lammps-users,

I have LAMMPS compiled with the GPU package running on a cluster node with two Tesla C2050. Now I was playing around with the crack example provided with the LAMMPS source. When I run the example with ‘normally’, the result is as expected - the lattice is pulled apart and a crack opens.

Then I added

suffix gpu
package gpu force/neigh 1 1 1.0 # device 0 is usually busy with calculations by someone else
newton off

to my input script and ran it on the cluster with GPU. What happens now is the following:

  • no crack develops
  • the thermodynamic data dumped into the log file is pretty different from the data I got without GPU (the GPU data differs up to 50% compared to to the data without GPU)
  • the group leftupper contains 841 atoms instead of 820

Now when I use

package gpu force 1 1 1.0

the data in my logfile seems ok, but still too many atoms are allocated to leftupper group and there is no crack.

Did I forget to set something in my input file or is this a bug?

Thanks,

Nikita

Hello lammps-users,

I have LAMMPS compiled with the GPU package running on a cluster node with
two Tesla C2050. Now I was playing around with the crack example provided
with the LAMMPS source. When I run the example with 'normally', the result
is as expected - the lattice is pulled apart and a crack opens.

Then I added

suffix gpu
package gpu force/neigh 1 1 1.0 # device 0 is usually busy with
calculations by someone else
newton off

to my input script and ran it on the cluster with GPU. What happens now is
the following:
- no crack develops
- the thermodynamic data dumped into the log file is pretty different from
the data I got without GPU (the GPU data differs up to 50% compared to to
the data without GPU)

that is because when using /gpu pair styles with neighborlist
builds on the GPU, you don't have support for any fancy gimmicks
related to them. the crack example uses

neigh_modify exclude type 2 3

that is not working with the GPU neighbor list build.

- the group leftupper contains 841 atoms instead of 820

have you compared to the reference output.
it has 841 in leftupper. in this example the
region boundaries overlap with the atom lattice
and whether an atom is "in" our "out" may
depend on rounding. most compilers relax
IEEE-754 rounding rules with higher optimization
for better performance and that can result
in differences between different executable
compiled with different compilers or flags.

Now when I use

package gpu force 1 1 1.0

the data in my logfile seems ok, but still too many atoms are allocated to
leftupper group and there is no crack.

i cannot reproduce this. this setup is supposed to work.
are you sure you didn't mix up trajectory files?

axel.

Hi

you might try the USER-CUDA package. In theory the neigh_modify exclude thing should work with it. But it is definately not one of the most well tested features. If you want to try it out let me know if it works or not (compile for double precision first so you should get basically the same output as with the CPU). If it doesnt work I can probably fix it within a day or so.

Cheers
Christian

-------- Original-Nachricht --------

Thank you for all your replies, that clarified the situation for me.

Axel,

thank you for the hint with neigh_modify. Is there a list of commands I can/can not use when running my simulations on GPUs? I wasn’t able to find too much information on restrictions of the GPU package in general.

As for the
package gpu force 1 1 1.0
command - you were right, i mixed up the trajectories, I’m sorry for providing the wrong information.

Christian,

also thank you for the USER-CUDA package hint. This is still on my to-do list. I wanted to try the GPU package first and then take a look at USER-CUDA. In the 12Oct11 distribution of LAMMPS there seems to be a flaw in both packages, as in some files the variable PI is not defined (in GPU I think it was pppm_gpu.cpp). I resolved it the quick and dirty way by simply defining it, somehow

#include “math_const”

and then changing PI to MY_PI didn’t work (I’m not a C programmer though, so maybe I just forgot something). In GPU this error only appeared in one file, so it was easy to patch. I don’t know how many files are concerned in USER-CUDA yet. I hope I will manage to get my hands on USER-CUDA still this week, then I will tell you my results.

Thank you all again,

Nikita

Hi Nikita

yeah the constants were forgotten to change then it was tried to clean up the many local constants mess. You idea of including math_const.h and changing it to MY_PI was correct. But you have also to put in a "using namespace MathConst;".

Cheers
Christian

-------- Original-Nachricht --------

Hi Christian,

No, I assumed that simply including the header would do it :slight_smile: I usually only do only scripting in python where some things are a bit easier than in C.
Thanks, I’ll do that when I attempt to compile the package. I think it would make sense to use math_const.h because it also defines PI/2, etc. and at least pppm_gpu uses PI/2 anyway so one would get rid of one multiplication in that module.

Greetings,
Nikita

Yeah that was the idea (getting rid of multiple definitions of PI and so on) it was just forgotten, but will come with the next patch as soon as Steve find the time to do it.

Christian

-------- Original-Nachricht --------

Is there a list of commands I

> can/can not use when running my simulations on GPUs?

Hi Nikita - You cannot use the force/neigh option with a triclinic box or neigh_modify exclude. I believe an error is generated for the former and definitely should be for the latter. I will add this. Hopefully everything else should be documented or checked. - Mike

nikita aigner wrote:

Hello Michael,

There is no error thrown when I use package gpu force/neigh 1 1 1.0 and neigh_modify exclude, neither in the log, nor in the output file.

logfile is here: http://pastebin.com/zRuHyYCw
output file is here: http://pastebin.com/p6UuMcax

the input file I use: http://pastebin.com/5guYP5Hy

The only thing there is, is a warning saying that the thermodynamic trajectory output is not defined for the group all.

Regards,

Nikita