Problem when run LAMMPS on 2 nodes

Hi all

I could run a small model (512 atoms and 1000 atoms) on 2 or more nodes with a pair style I wrote, and each of these nodes has 12 processors. However, It turned out to be a segmentation fault when I tried to put a larger model (such as 1280 atoms) in it. They only could run on one node.

Here is part of the error message in log file:

Setting up run …

That message isn't very helpful, unfortunately. All it says is that mpirun killed off your code because it noticed that one of the ranks had a segfault. These are typically caused by trying to dereference an unallocated pointer, or indexing past the end of an array.

Hi all

I could run a small model (512 atoms and 1000 atoms) on 2 or more nodes with
a pair style I wrote, and each of these nodes has 12 processors. However, It
turned out to be a segmentation fault when I tried to put a larger model
(such as 1280 atoms) in it. They only could run on one node.

Here is part of the error message in log file:

Setting up run ...
--------------------------------------------------------------------------
mpirun noticed that process rank 14 with PID 9357 on node comp02 exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

What might be the cause of this problem?

one or more bugs in your code.

Is there any code refers to communication between nodes in LAMMPS?

if you write a regular pair style, you should not need to worry about
communication. if your pair style is more complex, all bets are off
and nobody will be able to suggest anything unless you provide the
code. as has been explained repeatedly, none of the developers or
subscribers to lammps-users owns of crystal ball or has psychic
abilities.

there are plenty of pair styles in LAMMPS, many simple, some more
complex, some rather advanced. the best way to understand how LAMMPS
works, is to read that code and see what applies to your setup or not.

the only other alternative i see, would be to offer a contract to
somebody to debug and complete your work.

axel.

Hi Axel

Thank you for your reply. I know my question is confusing, however I am so sorry that I cannot provide my codes for some reason.

I just cannot understand what kind of bug will cause an error across nodes but without any effect when it runs in a single node?

Thanks again.

Yilian

Hi Axel

Thank you for your reply. I know my question is confusing, however I am so
sorry that I cannot provide my codes for some reason.

well, then you are on your own for the same reason.

I just cannot understand what kind of bug will cause an error across nodes
but without any effect when it runs in a single node?

i have seen so many different ways how people can mess up their code,
it is impossible to make even a guess without seeing the code. most
likely you just don't understand how lammps is parallelized.

axel.