Custom fix doesn't work in parallel

Nagarajan_Karthik · April 2, 2018, 9:00am

Dear all,

I am simulating a system containing 3 types of atoms (polymer, solvent and solid) using dissipative particle dynamics (DPD). The solid beads are frozen in place. I am trying to implement a custom fix (called fix_test_bc1) in LAMMPS that involves modifying atom velocities at the end of every time-step. In order to do this, I need to determine the force exerted upon each polymer and solvent bead by all the solid beads. I have attempted to extend pair_dpd in order to calculate this force which will be stored in the arrays fsx, fsy and fsz defined via fix property/atom. Please find attached the modified pair_dpd code as well as the code for my custom fix. I have also included the input and data files (test.in and test.dat).

When I compiled LAMMPS and performed a serial simulation using gdb, the code worked fine and produced reasonable results. However, when I try to run in parallel, it stalls before a single time-step is performed. I have implemented pack_reverse_comm and unpack_reverse_comm in pair_dpd. Also, I have set the ghost flag to yes for the fix property/atom command in the input script. May I know what could be the cause of the problem?

Best,
Karthik.

***DISCLAIMER*** The sender of this email is an alumnus of National University of Singapore (NUS). Kindly note that NUS is not responsible for the contents of this email, and views and opinions expressed are solely the sender's.

akohlmey · April 2, 2018, 2:33pm

Dear all,

I am simulating a system containing 3 types of atoms (polymer, solvent and solid) using dissipative particle dynamics (DPD). The solid beads are frozen in place. I am trying to implement a custom fix (called fix_test_bc1) in LAMMPS that involves modifying atom velocities at the end of every time-step. In order to do this, I need to determine the force exerted upon each polymer and solvent bead by all the solid beads. I have attempted to extend pair_dpd in order to calculate this force which will be stored in the arrays fsx, fsy and fsz defined via fix property/atom. Please find attached the modified pair_dpd code as well as the code for my custom fix. I have also included the input and data files (test.in and test.dat).

When I compiled LAMMPS and performed a serial simulation using gdb, the code worked fine and produced reasonable results. However, when I try to run in parallel, it stalls before a single time-step is performed. I have implemented pack_reverse_comm and unpack_reverse_comm in pair_dpd. Also, I have set the ghost flag to yes for the fix property/atom command in the input script. May I know what could be the cause of the problem?

you must be making a mistake somewhere.

nobody here has the time to debug other people's code. so you need to
figure it out on your own, where your code gets stuck and where.
you may find some help if you google for "attach gdb to running
process", so you can determine what each MPI rank is currently doing
by obtaining a stack trace and inspecting the value of specific
variables.

axel.

yacine_hacker · April 2, 2018, 2:38pm

fuck you

Giacomo_Fiorin · April 2, 2018, 8:02pm

Perhaps you mistook the LAMMPS mailing list for the YouTube comment section?

Nagarajan_Karthik · April 4, 2018, 4:20am

Dear Sir,

Thank you for the useful suggestion. I attempted to debug the code using gdb and inspected the values of various variables such as atom positions, forces and velocities in each MPI rank. The code is working as I expect it to. However, even with 2 MPI processes, the program spends a substantial amount of time within the pack_reverse_comm and unpack_reverse_comm functions that I added to pair_dpd. When I ran a simulation with newton off so as to avoid reverse communication, the code worked fine and the temperature, pressure, energy etc were similar to the results previously obtained in serial. Therefore, it seems that the substantial communication time renders parallel runs infeasible.

Of course, there is the option of calculating the pairwise forces between specific atom types separately in a custom fix which requests a full neighbour list so as to avoid reverse communication. However, the problem is that the random forces will change if the force calculation in pair_dpd is repeated at the end of the time-step. A similar problem arises if newton is turned off for pairwise forces. Does this mean that I am restricted to serial runs?

Best,
Karthik.

akohlmey · April 4, 2018, 4:44am

Dear Sir,

Thank you for the useful suggestion. I attempted to debug the code using gdb and inspected the values of various variables such as atom positions, forces and velocities in each MPI rank. The code is working as I expect it to. However, even with 2 MPI processes, the program spends a substantial amount of time within the pack_reverse_comm and unpack_reverse_comm functions that I added to pair_dpd. When I ran a simulation with newton off so as to avoid reverse communication, the code worked fine and the temperature, pressure, energy etc were similar to the results previously obtained in serial. Therefore, it seems that the substantial communication time renders parallel runs infeasible.

Of course, there is the option of calculating the pairwise forces between specific atom types separately in a custom fix which requests a full neighbour list so as to avoid reverse communication. However, the problem is that the random forces will change if the force calculation in pair_dpd is repeated at the end of the time-step. A similar problem arises if newton is turned off for pairwise forces. Does this mean that I am restricted to serial runs?

nobody can answer this without studying your code in detail. possibly
you are doing something in an inefficient manner, or there is
something conceptually not working or it works just by accident when
running without subdomains.
you are trying to do something complex, so don't expect that it is
simple to do and simple to get right in your first attempt. some
things just take time, require to go back and start over and review
everything. i always tell people, that good code gets written at least
3 times: first, to do it at all, then to do it right and finally, to
do it well. it looks to me, you are still in stage 1.

cheers,
axel.