Error while using non-periodic Boundaries

Hello Everyone,

I have recently started using LAMMPS. I am trying to equilibrate a Au-Si (non-periodic) system, having Au spherical dot above Si (cuboid) substrate. I am trying to equilibrate the system using ‘fix nve’ command along with ‘fix temp/rescale’ command on the entire system. (FYI, I am running the job in parallel on a 2 by 2 by 4 processor grid)

The code goes as follows:-

--------------------------equilibrate-----------------------

compute new_nve all temp
velocity all create 10 5834324 temp new_nve
fix 1 all nve
fix 2 all temp/rescale 1 10.0 10.0 0.5 1.0
timestep 0.001
run 100000

The simulation runs for around 860 time steps and terminates showing the following error,

820 10.013778 -118535.92 37.700183 -118498.22 867122.56 -781.49289
830 9.8336448 -118535.23 37.022011 -118498.21 867122.56 -697.55572
840 9.8691017 -118535.37 37.1555 -118498.21 867122.56 -614.27763
850 9.770578 -118535 36.784575 -118498.22 867122.56 -536.37449
860 9.7189102 -118534.8 36.590054 -118498.21 867122.56 -467.6334

rank 4 in job 1 radon-d009.rcac.purdue.edu_58952 caused collective abort of all ranks
exit status of rank 4: killed by signal 11

I am not sure if the error is occuring due to some error in making LAMMPS executable or any other issue.

i) I would really appreciate if any one could throw some light on this sort of error what I need to possibly do for such an error to not occur again.

ii) I am simulating the system as a micro-canonical system using the NVE ensemble. For this the volume and total energy of the system should remain constant during the simulation. However this doesn’t happen in my analysis. The initial volume at step 0 was 858575.76, and the final volume by the time the simulation terminates is 867122.56 , while the total_energy changes from -118306.58 eV to -118498.21 by the end of the simulation. So, somehow the ensemble is not being taken. Could this be the problem for the simulation to crash?

I tried to mention as much information as possible while I posed my query here. If I missed any crucial information I apologize for that, and if need any further clarification on the above problem, I can provide additional info as well.

Thankyou in advance, for your time and consideration in answering my query.

Sincerely,
Saikumar.

Hello Everyone,

I have recently started using LAMMPS. I am trying to equilibrate a Au-Si
(non-periodic) system, having Au spherical dot above Si (cuboid) substrate.
I am trying to equilibrate the system using 'fix nve' command along with
'fix temp/rescale' command on the entire system. (FYI, I am running the job
in parallel on a 2 by 2 by 4 processor grid)

The code goes as follows:-

# --------------------------equilibrate-----------------------
compute new_nve all temp
velocity all create 10 5834324 temp new_nve
fix 1 all nve
fix 2 all temp/rescale 1 10.0 10.0 0.5 1.0
timestep 0.001
run 100000

The simulation runs for around 860 time steps and terminates showing the
following error,

 820    10\.013778   \-118535\.92    37\.700183   \-118498\.22    867122\.56

-781.49289
830 9.8336448 -118535.23 37.022011 -118498.21 867122.56
-697.55572
840 9.8691017 -118535.37 37.1555 -118498.21
867122.56 -614.27763
850 9.770578 -118535 36.784575 -118498.22
867122.56 -536.37449
860 9.7189102 -118534.8 36.590054 -118498.21
867122.56 -467.6334

rank 4 in job 1 radon-d009.rcac.purdue.edu_58952 caused collective abort
of all ranks
exit status of rank 4: killed by signal 11

I am not sure if the error is occuring due to some error in making LAMMPS
executable or any other issue.

i) I would really appreciate if any one could throw some light on this sort
of error what I need to possibly do for such an error to not occur again.

this is a non-descript error that the MPICH library throws
when one of its (remote) processes dies. unfortunately this
behavior of MPICH "loses" the error message and this is
one of the reasons why i usually advise people to use
OpenMPI instead of MPICH. it is near impossible to tell
what error cause this with your setup. you can try running
for 850 steps and write a restart and then continue with
a single process and see, if you'll get some error. or write
out the trajectory more often and visualize it. perhaps something
unusual happens and that causes the problem.

ii) I am simulating the system as a micro-canonical system using the NVE

no, you are not.

ensemble. For this the volume and total energy of the system should remain
constant during the simulation. However this doesn't happen in my analysis.

no, it shouldn't.

The initial volume at step 0 was 858575.76, and the final volume by the time
the simulation terminates is 867122.56 , while the total_energy changes from
-118306.58 eV to -118498.21 by the end of the simulation. So, somehow the
ensemble is not being taken. Could this be the problem for the simulation to
crash?

no. this is a straight PEBCAC issue.

axel.

Hello Alex,

Thanks for your valuable suggestions.

I have found that when I use a single processor the simulation runs more number of steps compared to when I run the job in parallel. For example when I run the job on 16 processors, simulation crashes at 500th time step and when I run the same input file on 32 processors, the simulation crashes on 200th time step. This is strange. I am unable to understand why the simulation should be dependent on the number of processors being used.

All I can think of is that there could be a problem in the way LAMMPS was built on the cluster’s server.

Your opinion on the same is appreciated.

Thanks you for your valuable time

Thanks,
Sai.

Hello Alex,

Thanks for your valuable suggestions.

I have found that when I use a single processor the simulation runs more
number of steps compared to when I run the job in parallel. For example when
I run the job on 16 processors, simulation crashes at 500th time step and
when I run the same input file on 32 processors, the simulation crashes on
200th time step. This is strange. I am unable to understand why the
simulation should be dependent on the number of processors being used.

no, this is not strange. you likely have some bad dynamics
happening or your neighbor list rebuild settings are inadequate
for how fast atoms are moving. this would get worse the more
processors you use.

All I can think of is that there could be a problem in the way LAMMPS was
built on the cluster's server.

more likely, your input has a problem. you should get a
different error message when running in serial, probably
about "lost atoms". the issue with running in parallel using
MPICH is, like i originally said, that its error message is
hiding the real error message originating from LAMMPS.

axel.

Another thing to monitor, even when
running on a single processor is the
thermo output (temp, pressure, etc)
on a frequent timescale. If you do
this you will often see spikes, indicating
bad dynamics. As Axel said, in
parallel, with smaller domains per
processor, the same bad dynamics
can trigger a quicker error/exit.

Steve