Why is my program segfaulting after running the simulation?

I am running a free energy calculation on Rg for a polymer in water in LAMMPS using the COLVAR package. It is an NPT simulation with intel acceleration with an ABF acting on Rg.

I am seeing an error which seems to take place AFTER the simulation is done running. I don’t understand why this ought to happen. I have attached my simulation output. This is the final output message:

Ave neighs/atom = 373.35589
Ave special neighs/atom = 2.1235554
Neighbor list builds = 509 
Dangerous builds = 0 
colvars: Resetting the Collective Variables module.
Total wall time: 0:00:49
srun: error: stellar-i10n4: tasks 1-10,12,15,17-20,25,27,29,31-32,34-35,37-40,42-43,46-58,60-65,67-71,73-85,88-89,92-95: Segmentation fault (core dumped)
srun: Terminating StepId=968088.0
slurmstepd: error: *** STEP 968088.0 ON stellar-i10n4 CANCELLED AT 2023-09-19T17:37:58 *** 
srun: error: stellar-i10n4: tasks 0,11,13-14,16,21-24,26,28,30,33,36,41,44-45,59,66,72,86-87,90-91: Terminated
srun: Force Terminated StepId=968088.0

You can see this in the file npt.out.

As you can see, LAMMPS has also reported the total run time, so I assume the simulation has run its course, but then crashes out right after. What could be causing this?
I am running the following command on my cluster:
srun --ntasks=96 --nodes=1 --cpus-per-task=1 --exclusive lmp_colvar -sf intel -in npt.in > npt.out 2>&1.

where sys.npt.data is my data file, sys.pnipam.water.settings is my settings file, colvars.inp is my colvars input file, and npt.in is my LAMMPS input file. I have attached all my input files to this message.

I would appreciate any advice you have for me.

npt.in (3.8 KB)
sys.npt.data (7.1 MB)
sys.pnipam.water.data (4.4 MB)
sys.pnipam.water.settings (6.4 KB)
colvars.inp (444 Bytes)

Hi, since you just opened an issue on Colvars’ GitHub repository (link), which is a good idea, we’ll follow up there.


You should check the following things:

  • does the segfault happen with the latest LAMMPS version (2 Aug 2023 currently)?
  • does the segfault happen without using the “intel” suffix/acceleration?
  • does the segfault happen with a smaller number of processors?
  • where exactly does the segfault happen? See 11.4. Debugging crashes — LAMMPS documentation
  1. I tested the 2Aug2023 version and it worked!
  2. The segfault happened with and without -sf intel, with 21Sep2021.
  3. Yes, I changed pages and number of processors. I was segfaulting in every case for 21Sep2021.
  4. I have not checked that yet.

In any event, thank you for your analysis @akohlmey! the 2Aug2023 version works as far as I can tell!