[lammps-users] dump_modify pbc yes leads to 'mpirun......exited on signal 11 (Segmentation Fault)'

Jignesh_Dhumal · August 20, 2021, 12:15pm

Greetings,
lammps-users

I have been trying to shrink the box size of my system, a general fix npt works like a charm for shrinking the box size but the atoms migrated outside the box as seen when visualized with VMD, In order to avoid it, being an amateur at lammps I resorted to the use of ‘dump_modify pbc yes’ to remap the atoms into the box but always ran into the following error -

‘mpirun noticed that process rank 6 with PID 0 on node node22 exited on signal 11 (Segmentation fault).’

The simulation is running one of the nodes of a cluster. I understand I might not be able to achieve the end goal with dump_modify given my minuscule knowledge about lammps, but the query I have is why is it that the usage of this particular command leads to a specific error.

Running the simulation without that particular command in the script works and the simulation is terminated normally.

I am using the lampps29oct20 stable release.

The input script is as follows -

####VARIABLES####

variable datf index relax_HDPE12K.data
variable jid index HDPE12K

####Initialize####

dimension 3
atom_style full
boundary p p p
units real
log log.${jid}

####Potential_Initialize####

pair_style lj/cut 11.5
bond_style harmonic
angle_style harmonic
dihedral_style harmonic
improper_style none

####Data_Input####

read_data ${datf}

####Interatomic_Potential####

pair_coeff 1 1 0.023775 3.8983
pair_coeff 2 2 0.0038 3.195
pair_coeff 1 2 0.009505 3.5292

bond_coeff 1 350 1.33
bond_coeff 2 350 0.99

angle_coeff 1 33 109.47
angle_coeff 2 63 112.4
angle_coeff 3 50 109.5

dihedral_coeff 1 -2.5 1 3
dihedral_coeff 2 -2.0 1 3
dihedral_coeff 3 -3.4 1 3

####Shrink####

timestep 0.1
dump 1 all custom 1000 shrink_{jid}.lammpstrj id type x y z ix iy iz dump_modify 1 pbc yes fix 1 all npt temp 1 1 (100.0*dt) tchain 1 iso 1 1 500
run 1000000

write_data shrink_${jid}.data pair ij

akohlmey · August 20, 2021, 1:43pm

Two comments.

the pbc yes option is not what you want if you don’t want to see atoms move outside the box. this is caused by including the image flags (ix iy iz) in the output which will reconstruct the unwrapped positions upon reading the trajectory in VMD. the pbc option has a very different purpose as explained in the docs: it enforces that the regular coordinates (x y z) are strictly inside the simulation cell; for efficiency reasons, this is usually only enforced (implicitly) on timesteps where the neighbor lists are rebuilt.
your batch submission script requires 48 OpenMP threads. That would be extremely inefficient if you would be using OpenMP accelerated pair styles since you are using the same 48 processors for MPI parallelization and then you would be running on 48x48 = 2304 processors. Nevertheless it can still cause problems due to the increased stack space requirements when using that many threads since there are still parts of the code that have OpenMP enabled, e.g. in the domain class, which is likely triggered by your use of pbc yes.

Jignesh_Dhumal · August 25, 2021, 9:00am

Greetings,
Dr. Axel Kohlmeyer

I would like to thank you for your prompt advice, which has been a huge help in increasing computational efficiency of my calculations. I did improvise over my batch submission script to make it work efficiently and use 1 OpenMP thread per task thus using just 48 processors per the node capacity. Also, sorry for not sending a prompt reply, got busy with working out and testing your suggestions, which again were on point and amazing.

Although, excluding the image flags from the dump command didn’t work as expected. The molecules still migrated outside the box as seen when visualized with VMD. But when visualized with Ovito, that wasn’t the case and molecules still seem to be within the box. I am yet to figure that out.

Shifting the focus from the molecules traveling outside the box, back to the dump_modify pbc yes command. As you rightly pointed out that the script might be causing problems related to increased stack space requirements when using 48 OpenMP threads per task i.e 48x48 = 2304 processors. I am running into the same problem even after using 1 OpenMP thread per task i.e. 1x48 = 48 processors.

akohlmey · August 25, 2021, 10:07am

[…]

Although, excluding the image flags from the dump command didn’t work as expected. The molecules still migrated outside the box as seen when visualized with VMD. But when visualized with Ovito, that wasn’t the case and molecules still seem to be within the box. I am yet to figure that out.

what you say is impossible. please provide a minimal input file that demonstrates that. I implemented support for image flags and other improvements into the VMD molfile plugin that reads LAMMPS trajectories and I know quite a bit about how VMD and LAMMPS work internally.
thus please provide a minimal input script that reproduces the behavior. I suspect that it does not do what you say it does.

Shifting the focus from the molecules traveling outside the box, back to the dump_modify pbc yes command. As you rightly pointed out that the script might be causing problems related to increased stack space requirements when using 48 OpenMP threads per task i.e 48x48 = 2304 processors. I am running into the same problem even after using 1 OpenMP thread per task i.e. 1x48 = 48 processors.

Please also provide the data file, so that I can run this input deck myself with several debugging tools and instrumentation enabled to check for what may be going on. You have a very large box with a rather moderate number of atoms, so there are some potential issues due to parallelization that can be triggered by such an input deck when running with many MPI ranks.

Axel.

akohlmey · August 26, 2021, 10:16am

Greetings,
Dr. Axel Kohlmeyer

please provide a minimal input file that demonstrates that.

Please find attached to the mail, files titled in.minimal_input and minimal_input.data. Also attached is a .mpg file named demonstration.mpg. visualizing the trajectory file based off the aforementioned inputs.

ah, I see now. well, you cannot tell the difference from the visualization, but this is not really due to the atoms moving outside the box, but about the fact that the origin of the box is changing when it is shrinking. VMD was originally conceived to visualize DCD format trajectory files from CHARMM and NAMD and those files store only the box sizes, but not the origin (that value is implied). Thus also VMD does not store this information internally and therefore the visualization done by the pbctools package cannot visualize the box correctly when the origin is changing. So what you see is less a problem of the dump, but a problem of the visualization of the box from pbctools. You could add the flag -center com or -center bb to the pbc box command and the location of the origin of the box should be updated. https://www.ks.uiuc.edu/Research/vmd/plugins/pbctools/

Please also provide the data file, so that I can run this input deck myself with several debugging tools and instrumentation enabled to check for what may be going on.

Concerning this, attached to the mail are the files in.pbc_yes_error and pbc_yes_error.data respectively.

thanks, I will check them out.

akohlmey · August 26, 2021, 11:25am

[…]

Please also provide the data file, so that I can run this input deck myself with several debugging tools and instrumentation enabled to check for what may be going on.

Concerning this, attached to the mail are the files in.pbc_yes_error and pbc_yes_error.data respectively.

thanks, I will check them out.

I have identified the bug that is causing the segfault. It is triggered by having a very sparse system, so that you have no atoms per processor.
there will be a fix an an upcoming version of LAMMPS.

but please note that running with as many CPUs as you are doing is not very efficient for such a small and sparse system.
even with a small number of CPUs this is inefficient.

with the following commands you can switch to tiled domain decomposition and use recursive bisectioning for load balancing.
that should speed up your calculation substantially and also - as a side effect since you now will always have atoms on any CPU - avoid the segfault.

comm_style tiled
balance 1.0 rcb
fix lb all balance 1000 1.0 rcb

Jignesh_Dhumal · August 27, 2021, 7:36am

Greetings,
Dr. Axel Kohlmeyer

I did run some test calculations and it has increased the computational efficiency drastically.

with the following commands you can switch to tiled domain decomposition and use recursive bisectioning for load balancing.
that should speed up your calculation substantially and also - as a side effect since you now will always have atoms on any CPU - avoid the segfault.

comm_style tiled
balance 1.0 rcb
fix lb all balance 1000 1.0 rcb

This was a great help, and has resolved the entire issue. I can not thank you enough!

So what you see is less a problem of the dump, but a problem of the visualization of the box from pbctools. You could add the flag -center com or -center bb to the pbc box command and the location of the origin of the box should be updated.

And this worked like a charm. Thanks a lot!

Appreciations and Regards,
Jignesh D.