Lammps error: mpirun noticed that process rank 33 with PID 257480 on node c469 exited on signal 11

Dear all,

I have tried to run a simulation in Lammps using a ReaxFF in HPR and my own computer. The simulation has few particles, about 160. The commands used to run it are as following:

–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–

REAX potential for Nitroamines system

dimension 3
boundary p p p
units real

atom_style full
read_data Dimer1.dat

pair_style reax/c NULL
pair_coeff * * forcefield.EL C F H Li N O S
neighbor 2. bin
neigh_modify every 10 delay 0 check no
fix 2 all qeq/reax 1 0.0 10.0 1e-6 reax/c

velocity all create 5.0 4928459 rot yes dist gaussian

dielectric 100

fix 1 all npt temp 5.0 330.0 500.0 iso 1.0 1.0 10000.0
timestep 0.1
thermo_style custom step temp evdwl ecoul pe press vol lx ly lz density
thermo 100
dump 1 all custom 100 md1.lammpstrj id type x y z ix iy iz q
run 10000000

write_data Dimer2.dat
–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–

It started working fine and everything seemed normal, but after a few femtoseconds it stopped and the following error message was printed in Both computers (HPC and my own computer):

–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–
[c469:257480:0:257480] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x6a73e6b0)
==== backtrace (tid: 257480) ====
0 0x0000000001395176 Validate_Lists() ???:0
1 0x00000000013956bd Init_Forces_noQEq() ???:0
2 0x0000000001395e7c Compute_Forces() ???:0
3 0x000000000138cd01 LAMMPS_NS::PairReaxC::compute() ???:0
4 0x0000000000d1cc5e LAMMPS_NS::Verlet::run() ???:0
5 0x0000000000cdc51e LAMMPS_NS::Run::command() ???:0
6 0x0000000000b9442b LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run>() ???:0
7 0x0000000000b92905 LAMMPS_NS::Input::execute_command() ???:0
8 0x0000000000b92e66 LAMMPS_NS::Input::file() ???:0
9 0x0000000000400e58 main() ???:0
10 0x0000000000022555 __libc_start_main() ???:0
11 0x0000000000400eb8 _start() ???:0

[c469:257480] *** Process received signal ***
[c469:257480] Signal: Segmentation fault (11)
[c469:257480] Signal code: (-6)
[c469:257480] Failing at address: 0x673d0003edc8
[c469:257480] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b8a4adf7630]
[c469:257480] [ 1] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_Z14Validate_ListsP12_reax_systemP7storagePP10_reax_listiiii+0x1b6)[0x2b8a46b6b176]
[c469:257480] [ 2] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_Z17Init_Forces_noQEqP12_reax_systemP14control_paramsP15simulation_dataP7storagePP10_reax_listP15output_controls+0x3cd)[0x2b8a46b6b6bd]
[c469:257480] [ 3] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_Z14Compute_ForcesP12_reax_systemP14control_paramsP15simulation_dataP7storagePP10_reax_listP15output_controlsP13mpi_datatypes+0x2c)[0x2b8a46b6be7c]
[c469:257480] [ 4] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_ZN9LAMMPS_NS9PairReaxC7computeEii+0x1b1)[0x2b8a46b62d01]
[c469:257480] [ 5] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_ZN9LAMMPS_NS6Verlet3runEi+0x21e)[0x2b8a464f2c5e]
[c469:257480] [ 6] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_ZN9LAMMPS_NS3Run7commandEiPPc+0x32e)[0x2b8a464b251e]
[c469:257480] [ 7] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc+0x2b)[0x2b8a4636a42b]
[c469:257480] [ 8] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_ZN9LAMMPS_NS5Input15execute_commandEv+0x875)[0x2b8a46368905]
[c469:257480] [ 9] /sw/eb/sw/LAMMPS/3Mar2020-foss-2019b-Python-3.7.4-kokkos/lib64/liblammps.so.0(_ZN9LAMMPS_NS5Input4fileEv+0x156)[0x2b8a46368e66]
[c469:257480] [10] lmp(main+0x48)[0x400e58]
[c469:257480] [11] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b8a4bb31555]
[c469:257480] [12] lmp[0x400eb8]
[c469:257480] *** End of error message ***

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 33 with PID 257480 on node c469 exited on signal 11 (Segmentation fault).

–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–o–

Does anyone know how I can solve it?

Best,

Anderson

This kind of error for ReaxFF has been reported and discussed many times. I am certain a suggestion suitable for your case is among them. Please research the archives before posting such a question.

Dear Dr. Axel,

I have been researching for answers about this issues but I just found this: LAMMPS parallel run using openmpi on macos - #3 by 93jongwun. But unfortunately, after your recommendations, the user not replied more to know if the problem was resolved.

Also I have tried to do it your suggestions of the last link, but I have not been successfully. Do you know how I can resolve it?

Best,

Anderson

In general, there are no “do this, not that” kind of answers. You first need to understand what is happening. That is done with visualization and you have to store the trajectory frequently enough to see something.

Is this correct??? Does it say where you got the parameter file that you should use this setting?

Try running with fix nvt first for a while.

They should work, you probably didn’t use the suitable intervals.

As was explained, it is likely due to many changes happening to your system and that is usually an indication of a bad choice of initial geometry a bad set of parameters.

Sometimes, you can get away by using the KOKKOS package in serial mode. It uses a slightly different memory management, that has been shown to be a bit more robust.

I recommend updating to latest LAMMPS and using the KOKKOS package with the Serial backend for CPUs. Not only is it more memory robust, but it is also faster in some cases than regular ReaxFF too.