Segmentation Fault Error when Running LAMMPS in Parallel (mpirun -np 6)

Sourav14 · September 2, 2023, 7:31am

Dear LAMMPS Community,

I hope this message finds you well. I am currently encountering an issue with LAMMPS when attempting to run simulations in parallel using 6 processors (mpirun -np 6). I’m experiencing a Segmentation Fault error at signal [11], but interestingly, the program runs without error when I use 4 processors (mpirun -np 4).

I have performed some initial troubleshooting, such as checking for memory issues, ensuring proper compilation, and reviewing my input scripts and data files, but I have not been able to identify the root cause of this issue.

I am currently using 23Jun2022 version.

I have also attempted to debug the issue using gdb, but I haven’t been able to pinpoint the exact location of the error.

If anyone in the LAMMPS community has encountered a similar problem or has insights into debugging Segmentation Fault errors in a parallel LAMMPS simulation, I would greatly appreciate your guidance and expertise.

Additionally, if you require any additional information or logs to help diagnose the issue, please let me know, and I will provide them promptly.

Thank you in advance for your assistance, and I look forward to your valuable input.

yury-lysogorskiy · September 2, 2023, 7:39am

What is your structure, how large is it ? What is the pair style? Difference between 4 and 6 MPI processes could be in different domain decomposition…

Sourav14 · September 4, 2023, 7:27am

I am using lj units. The dimensions of my simulation box is 41.85 and I am using morse/smooth/linear pair style with cut-off 1.4.

Germain · September 4, 2023, 7:47am

Hi @Sourav14,

It is nearly impossible to help you without more details.

Could you provide a minimum working example of your system? Giving the dimensions and pair style is clearly not enough. It would be useful to know other parameters (timestep, masses, number of species, etc)
Could you pinpoint which command makes the system crash?

The number of processors mainly change how the box is divided and thus dimensions and relationships between neighbour lists, ghost atoms etc. The most likely reason your system ends up crashing would be a bad set of parameters leading to wrong dynamics but it is impossible to tell without more context.

Sourav14 · September 5, 2023, 11:32am

Hi, Thank you for your response.
No, I am still unable to pinpoint the cause of the fault. I am using timestep 0.002
There are total 8 species in my simulation. Each type with different masses.

Another thing I noticed that it is running without any error when I am using 8 MPI processes.
I am curious to know if LAMMPS itself has any specific requirement or recommendation regarding the number of processors being a power of two.

Here are some specific points I’d like to understand:

Does LAMMPS inherently require or benefit from using a power-of-two number of processors?
Are there any specific considerations or advantages to using powers of two when setting up parallel simulations in LAMMPS?
Does the choice of the number of processors affect the performance or scalability of LAMMPS simulations in any way?

akohlmey · September 5, 2023, 11:45am

No.

Yes.

akohlmey · September 5, 2023, 1:09pm

@Sourav14 You obviously don’t have much experience in tracking down the origin of segmentation faults. There are people that have. However, in order to help you, we need to know a few things like what LAMMPS version you are using, what platform you are running on and how you compiled or otherwise obtained your LAMMPS executable.

But most important is that you would produce a minimal working example that reproduces the crash quickly and without using much CPU resources and that does not have any commands that are not required to trigger the crash. With such an input deck (no binary restarts please!), we will be able to use our own debugging tools and strategies and usually will be able to pinpoint the cause of the crash rather quickly (there is a lot that can be done with a some experience, that will take you a looooong time to figure out on your own).

Sourav14 · September 6, 2023, 9:01am

Yes, Thank you for the information. I am using LAMMPS 23rd June 2022 version and built it using Cmake.

akohlmey · September 6, 2023, 9:46am

This is too little, too late.

If you don’t want any help, why do you post here?

I can only come to this conclusion since you are so consistently avoiding to provide the crucial information required to reproduce your issue and potentially address the problems.
So far, you have mostly wasted your time and that of the people trying to help you. If you don’t change your ways, there is no point in continuing the discussion and thus I will just close the topic.

Sourav14 · September 6, 2023, 10:17am

Apologies for the delayed responses but due to some unavoidable circumstances I could not able to respond on time. I deeply value our communication and your understanding. Please know that this delay was not a reflection of my respect for you or the importance of our interaction. I understand that my tardiness may have caused inconvenience or concern, and for that, I am truly sorry.

Since I am a beginner in this field. I could not able to totally understand what crucial information do you require. So far, I am using LAMMPS 23rd June 2023 version and using it on a Ubuntu 22.04 OS. I have compiled LAMMPS using the CMake build system. We used CMake to configure the build with the necessary options (Include packages BROWNIAN, DIPOLE) to our requirements. After configuring, I initiated the build process using CMake, which generated the executable binary for LAMMPS.

Moving forward, I will make every effort to prevent such delays in the future. If you have any concerns or would like to discuss this further, please reach me out.

Thank you for your understanding and patience. I appreciate your continued support.

srtee · September 6, 2023, 11:05am

I won’t speak for others here, but to me, the main inconvenience is to yourself, since your problem is going unsolved until you provide us with sufficient information, which you haven’t.

You need to think like a scientist and work out exactly what commands in your script cause your system to crash and what commands don’t. You can do this because you have (1) your script (2) your initial configuration (3) enough computer resources to replicate the crash and to identify that it happens on six processors and not eight.

We do not have any of that. It is your responsibility to help us solve your problem.

Sourav14 · September 6, 2023, 12:16pm

Sure and Thank you for your response. Here is some portion of my input script. I am unable to pinpoint the exact command which causing the segmentation fault while for mpi -n 6 but not mpi -n 8.

Configuration

dimension 3
units lj
atom_style hybrid dipole sphere ellipsoid
atom_modify map array
boundary p p p

variable N equal 1000

region box block 0 41.85 0 41.85 0 41.85
create_box 8 box

pair_style morse/smooth/linear 1.4
create_atoms 1 random (v_N) (v_Seed+1) NULL
create_atoms 2 random (v_N) (v_Seed+2) NULL

velocity all create 1.0 5628459 rot yes dist gaussian

mass 1 0.356818
mass 2 0.40772
mass 3 0.463247
mass 4 0.523599
(and so on upto 8)

variable run_temp equal 1
variable T equal ${run_temp}
variable Dt equal 0.05

pair_modify shift yes
set group all shape 1 1 1
set group all quat/random 50
comm_modify vel yes
pair_coeff 1 1 ${morse_well_depth} 33 0.78 1.234

pair_coeff 1 2 ${morse_well_depth} 33 0.8 1.28

pair_coeff 1 3 ${morse_well_depth} 33 0.82 1.326

(and so on…)

pair_modify shift yes

min_style cg
minimize 1e-10 1e-10 1000000 1000000
timestep 0.002

group Act type 8
group gel type 1 2 3 4 5 6 7

.
dump dumpatom all custom 100000 ./dump.* id type x y z fx fy fz
dump_modify dumpatom sort id first yes

variable varstep equal step
variable vartemp equal temp
variable varpress equal press
variable varetotal equal etotal

fix thermo_print all print 10000 " {varstep} {vartemp} {varpress} {varetotal}" file ./equil.thermo screen no

thermo 10000
thermo_style custom step temp press pe

fix lan all langevin 1.0 1.0 {Dt} {Seed} zero yes angmom 3.333
fix nves all nve/sphere

run 100000000

Also Could you suggest me some way to debug this or the possibilities or probable cause to this kind of error so that I can work upon that.
Thank you.

akohlmey · September 6, 2023, 12:34pm

There are two problems:

Those parts are not properly quoted (just look at it), so the forum software will misformat parts of it, which makes it difficult to read and understand what it does.
It is incomplete, so anyone wanting to debug this, has to invest significant extra effort to figure out how to complete it and turn it into a working input deck

Both of these problems are easily avoidable (by you!) and it is explained in the forum guidelines post and some of the previous discussions here. There are several things that are not helpful in this context and you have done all of them.

I will now close this topic. You should take the opportunity and re-ready what was suggested and also read the forum guidelines (again) and then think for a couple of days about what it is that you have to do so that people actually can help you. So far you have failed at that in an annoyingly bad way. Then, after you have figured out how to do things better, produce a new post that is aligned with the guidelines and suggestions you were given and we can take a new look at things.

akohlmey · September 6, 2023, 12:34pm