Problem using compute force/tally on multiple processors

Srilok_Srinivasan · June 12, 2017, 7:53pm

Hello Lammps users,

I have two groups of atom in my system and I am trying to find the force experienced by each atom in one group due to all the atoms in the other group using compute force/tally. The interaction between the two groups is modelled by 12-6 LJ potential

I am using the latest lammps-master version from github. My code runs fine in serial. However when I run it in parallel in the cluster, it does not output anything after the first thermo output. The simulations seems to hang and nothing happens till my job runs out of time. I don’t have this problem when I remove the compute commands.

I am not sure why I am having this problem. Can you help me fix this?

Below is the last few lines of the output I get
"
(4) neighbor class addition, perpetual
attributes: full, newton on
pair build: full/bin
stencil: full/bin/3d
bin: standard
(5) neighbor class addition, perpetual, half/full from (4)
attributes: half, newton on
pair build: halffull/newton
stencil: none
bin: none
Setting up Verlet run …
Unit style : metal
Current step : 30562
Time step : 0.001
Per MPI rank memory allocation (min/avg/max) = 2.038 | 3.723 | 9.552 Mbytes
Step TotEng E_pair Temp Press Volume c_GrT c_MoS2T v_delT
30562 -27032.714 -27249.212 341.88517 239.88708 2323377 163.01504 528.35384 -365.3388
"

Also when I run in serial I have get only 2 lines of following warning corresponding to the 2 incompatible forcefields (tersoff, sw).

WARNING: Compute force/tally used with incompatible pair style (…/compute_force_tally.cpp:73)
WARNING: Compute force/tally used with incompatible pair style (…/compute_force_tally.cpp:73)

However, when I run in parallel on 128 processors I get 256 lines of the same warning message.

I can provide the code and data files if you require.

I have attached my code below.

read_restart nve.restart

pair_style hybrid sw tersoff lj/cut 10
pair_coeff * * sw mos2.sw NULL Mo S S NULL
pair_coeff * * tersoff Linsay_Broido_optimized.tersoff C NULL NULL NULL C
pair_coeff 1 3 lj/cut 0.00395 3.625
pair_coeff 1 2 none
pair_coeff 2 5 none
pair_coeff 4 5 lj/cut 0.00395 3.625
pair_coeff 1 4 none
pair_coeff 3 5 none
pair_coeff 1 5 lj/cut 0.00239668 3.414776173
pair_coeff 1 4 none

pair_modify pair lj/cut compute/tally yes
pair_modify pair tersoff compute/tally no
pair_modify pair sw compute/tally no

group C type 1
group Mo type 2
group S1 type 3
group S2 type 4
group MoS2 type 2 3 4

timestep 0.001

compute GrT C temp
compute MoS2T MoS2 temp

compute F_C_M C force/tally MoS2
compute F_M_C MoS2 force/tally C

variable delT equal c_GrT-c_MoS2T

thermo_style custom step etotal epair temp press vol c_GrT c_MoS2T v_delT

log log.spectral_k

thermo 100

fix 1 all nve

dump 1 C custom 10 C.atom id type vx vy vz c_F_C_M[1] c_F_C_M[2] c_F_C_M[3]
dump_modify 1 sort id
dump 2 MoS2 custom 10 MoS2.atom id type vx vy vz c_F_M_C[1] c_F_M_C[2] c_F_M_C[3]
dump_modify 2 sort id
run 100000
unfix 1

Thanks for your help.

akohlmey · June 12, 2017, 10:08pm

Hello Lammps users,

I have two groups of atom in my system and I am trying to find the force
experienced by each atom in one group due to all the atoms in the other
group using compute force/tally. The interaction between the two groups is
modelled by 12-6 LJ potential

I am using the latest lammps-master version from github. My code runs fine
in serial. However when I run it in parallel in the cluster, it does not
output anything after the first thermo output. The simulations seems to
hang and nothing happens till my job runs out of time. I don't have this
problem when I remove the compute commands.

I am not sure why I am having this problem. Can you help me fix this?

this is a tricky issue caused by a conflict between collecting information
for the dump file and compute force tally requiring a reverse communication.
it seems, that whenever you reach the point, that there are no atoms in a
subdomain, the reverse communication is not consistently called and thus
stalls.

does your system have a significant amount of vacuum? then a possible
workaround could be to try change the processor grid, so that all mpi ranks
are owning atoms, and/or use the load balancing options to redistribute
atoms more evenly between processors. depending on the geometry of your
system, that may result in better performance overall as well.

i am exploring some ideas for a more general solution for this, but that
will require some time and programming and also a discussion with other
LAMMPS developers.

Below is the last few lines of the output I get
"
  (4) neighbor class addition, perpetual
      attributes: full, newton on
      pair build: full/bin
      stencil: full/bin/3d
      bin: standard
  (5) neighbor class addition, perpetual, half/full from (4)
      attributes: half, newton on
      pair build: halffull/newton
      stencil: none
      bin: none
Setting up Verlet run ...
  Unit style : metal
  Current step : 30562
  Time step : 0.001
Per MPI rank memory allocation (min/avg/max) = 2.038 | 3.723 | 9.552 Mbytes
Step TotEng E_pair Temp Press Volume c_GrT c_MoS2T v_delT
   30562 -27032.714 -27249.212 341.88517 239.88708
2323377 163.01504 528.35384 -365.3388
"

Also when I run in serial I have get only 2 lines of following warning
corresponding to the 2 incompatible forcefields (tersoff, sw).

WARNING: Compute force/tally used with incompatible pair style
(../compute_force_tally.cpp:73)
WARNING: Compute force/tally used with incompatible pair style
(../compute_force_tally.cpp:73)

However, when I run in parallel on 128 processors I get 256 lines of the
same warning message.

confirmed. this is due to an oversight when adding these warning messages.
i have submitted a patch for inclusion into the next lammps release.

https://github.com/lammps/lammps/pull/510/commits/221572a100d74902562130faa705053fa0f7ca45

axel

akohlmey · June 13, 2017, 2:39am

ok, it looks like i have found a way to address the handing issue and have submitted a pull request with the proposed changes at: https://github.com/lammps/lammps/pull/525

this requires a couple of (small) changes to the Compute and Pair class, so it is not guaranteed that this will be accepted into the next release without changes.

in case you want to try it out now, you can access this modified version as a branch in my personal LAMMPS fork: https://github.com/akohlmey/lammps/tree/user-tally-refactor

good luck,
axel.

Srilok_Srinivasan · June 13, 2017, 4:56pm

ok, it looks like i have found a way to address the handing issue and have
submitted a pull request with the proposed changes at:
Refactoring of USER-TALLY computes to handle sparse/hybrid system for many processors plus bugfixes by akohlmey · Pull Request #525 · lammps/lammps · GitHub

this requires a couple of (small) changes to the Compute and Pair class,
so it is not guaranteed that this will be accepted into the next release
without changes.

in case you want to try it out now, you can access this modified version
as a branch in my personal LAMMPS fork: https://github.com/
akohlmey/lammps/tree/user-tally-refactor

good luck,
axel.

Dr. Kohlmeyer,

Thanks a lot for your help! It works and gets the job done for now.

Hello Lammps users,

I have two groups of atom in my system and I am trying to find the force
experienced by each atom in one group due to all the atoms in the other
group using compute force/tally. The interaction between the two groups is
modelled by 12-6 LJ potential

I am using the latest lammps-master version from github. My code runs
fine in serial. However when I run it in parallel in the cluster, it does
not output anything after the first thermo output. The simulations seems to
hang and nothing happens till my job runs out of time. I don't have this
problem when I remove the compute commands.

I am not sure why I am having this problem. Can you help me fix this?

this is a tricky issue caused by a conflict between collecting
information for the dump file and compute force tally requiring a reverse
communication.
it seems, that whenever you reach the point, that there are no atoms in a
subdomain, the reverse communication is not consistently called and thus
stalls.

does your system have a significant amount of vacuum? then a possible
workaround could be to try change the processor grid, so that all mpi ranks
are owning atoms, and/or use the load balancing options to redistribute
atoms more evenly between processors. depending on the geometry of your
system, that may result in better performance overall as well.

You are absolutely right! My system consists of 2D materials aligned along
the xy plane. The z dimension is kept large to prevent the interaction with
the periodic image along that direction, so I have vacuum. Setting the Pz
as 1 with processors command will ensure that all the processors will have
at least one atom. In the present master version, setting Pz as 1 also
seems to works and gives better performance like you mentioned.

Again, thanks for your help!