Hi all,

Hopefully there is a simple answer to this. I tried to include as much information as possible in this email, just in case it’s useful.

I built three different versions of LAMMPS, the first with the default packages, the second with the default + GPU packages, and the third with the default + User CUDA packages.

When I run the melt.2.5 GPU example simulation using the first build (running on CPU only), the output value for the *total # of neighbours* is 10,039,927. When I run the simulation again using the second build (using the GPU package with the “package gpu force 0 1 1” command), the output value for the *total # of neighbours* is 19,190,086.

If I use the CUDA package (the third build), the *# of neighbour* results is the same as with the GPU package (which is 19,190,086).

The *total # of neighbours* using the GPU or CUDA package is always around 2x the value returned when using only the CPU (when the number of neighbours is low, it’s usually exactly 2x). All of the other results in the output are the same (the data table, etc) for both builds. The only difference is the *# of neighbours* values. This does not just happen for the melt simulation, but for all of the simulations I’ve tried so far.

Another thing to note is that if I run the CUDA build with CUDA turned off (“-c off” option), or the GPU build without the “-sf gpu” option, it returns the same number of neighbours (10,039,927) as the CPU build (which is expected, but I figured I’d write it anyways).

There is a simple answer. CPU pair styles typically

use a half neighbor list where the I,J pair appears once.

GPU pair styles typically use a full neighbor list where

I,J is stored both with atoms I and J. Hence the factor

of 2.

Steve

Hi all,

Hopefully there is a simple answer to this. I tried to include as much

information as possible in this email, just in case it’s useful.

[...]

Another thing to note is that if I run the CUDA build with CUDA turned off

(“-c off” option), or the GPU build without the “-sf gpu” option, it

returns the same number of neighbours (10,039,927) as the CPU build (which

is expected, but I figured I’d write it anyways).

yes, because if you compile the GPU or USER-CUDA package into LAMMPS, but

are not using the corresponding styles, you will *still* run on the CPU and

use the code in the exact way as if you had not included any of those

packages. LAMMPS is structured in a way, that you can including multiple

variants of the same thing and then select at run time which of those you

want to use.

[...]

These are my thoughts so far:

1) One noticeable difference is the “Neighs” VS “FullNghs”, but from what

I understand, it’s only a difference between a half neighbour list and a

full neighbour list. I wouldn’t think this would make a difference for the

total number of neighbours since it’s only a change in list structure.

here is where you are wrong. the half vs. full neighbor list makes all the

difference. in a full neighbor list all pairs are listed twice. once for

each atom that constitutes each pair. in the half neighbor list, those

pairs are distributed, so that you have in total only half the neighbors.

2) I tried remaking both the packages and LAMMPS a few times, and tried

making the GPU and CUDA packages with both single and double precision

(just to try it).

that has no impact at all.

3) I looked at the pair style used in the example (lj/cut), and the docs

say “Styles with a *cuda*, *gpu*, *omp*, or *opt* suffix are functionally

the same as the corresponding style without the suffix”, so I can’t see

this being the problem.

functionally the same means they implement the same potential, but they do

it differently. and one of the differences is the choice of whether you

apply newton's third law or now.

axel.

4) I would think that since the skin distance and the forces are not

Thanks Axel and Steve for your quick replies. I misunderstood what exactly the total number of neighbours represented, I figured that the total number of neighbours was the total number of distinct neighbour pairs, not the size of the neighbour list. My only question now then is why the number of neighbours in the full list is not *exactly* 2 times the number of neighbours in the half list? From the examples I have run, it is usually somewhere between 1.74x and 2x (those are the two extreme values I have found so far). If it was around 1.98x, it might be caused by rounding errors when building the neighbour list, but I wouldn’t think rounding errors would cause the difference between 2x and 1.74x. Is there a simple explanation as to why the full list is sometimes notably less than double the size of the half list, or could there be many different factors involved? The main reason for this question is that I’m trying to figure out whether or not there is potential for the simulation running on the GPU to return different results than when it runs on the CPU (ignoring rounding errors and order of operation errors).

Thanks,

Steve

Thanks Axel and Steve for your quick replies. I misunderstood what exactly

the total number of neighbours represented, I figured that the total number

of neighbours was the total number of distinct neighbour pairs, not the

size of the neighbour list. My only question now then is why the number of

neighbours in the full list is not *exactly* 2 times the number of

neighbours in the half list? From the examples I have run, it is usually

somewhere between 1.74x and 2x (those are the two extreme values I have

found so far). If it was around 1.98x, it might be caused by rounding

errors when building the neighbour list, but I wouldn’t think rounding

errors would cause the difference between 2x and 1.74x. Is there a simple

explanation as to why the full list is sometimes notably less than double

the size of the half list, or could there be many different factors

involved? The main reason for this question is that I’m trying to figure

out

there are two factors involved here: 1) whether the pair style requests a

half or a full neighbor list, 2) whether "newton off" or "newton on" is

used. the newton keyword does not apply to what the pair style itself

requests, but how to handle the situation where a pair is split between two

subdomains. even with a half neighbor list, you may have "newton off" and

then you have more neighbors than with "newton on".

whether or not there is potential for the simulation running on the GPU to

return different results than when it runs on the CPU (ignoring rounding

errors and order of operation errors).

no. there should be no significant differences. the only way that i know

how neighbor lists can become a problem in this context, is when you do

some fancy manipulations with exclusions.

axel.

Thanks again for the quick reply, that was exactly what I was looking for. I appreciate your help!

Regards,

Steve