Dear lammps-users
I have a strange problem - my lammps runs get stuck - almost randomly
The setup:
LAMMPS compiled with yes-gpu on (1) openmpi / GTX 480 (2) intel-MPI / Fermi
Running short (50000 - 100000 steps) NVT simulations for 4140 atoms
with harmonic bonds, harmonic angles, opls dihedrals
non-bond interactions are
pair_style lj/cut/coul/cut/gpu 9.0
neighbour lists are unmodified; The "control" file is at the end of the mail
"inserting" a particle
fix dep1 part deposit 1 3 1000000 12345 region reg1 near 1.25
attempt 5000 vx 0.001 0.001 vy 0.001 0.001 vz 0.001 0.001 units box
and adding a force to it
fix force1 part addforce 0.0 ${fy} 0.0
fy is a variable which is assigned at run time from command line with
a float value between 0 and 20.
I am also using
dump edata part custom 1 part.edata.y.${fy} id c_uke c_upe
c_dispart[4] vx vy vz
The simulation gets "stuck"
last observed
fy= 16.709
ERROR on proc 0: Bond atoms 2567 2680 missing on proc 0 at step 178
at other runs,
fy=14.031
several of
WARNING: Dihedral problem: 1 25552 4132 3983 3984 3986
and a
ERROR on proc 1: Bond atoms 3984 3983 missing on proc 1 at step 25553
and several of
Cuda driver error 4 in call at file 'geryon/nvd_timer.h' in line 44.
then stuck.
another similar:
fy=14.612
At other times - under identical conditions, ~100s of such runs
proceed without any issue.
I don't know what you mean by "stuck". You listed
various error messages printed out. When LAMMPS hits
an error where it prints such a message, it exits.
With a warning, it keeps going, since it can recover.
As to why that would happen in some runs and not others,
that's an issue for your simulation. If you are doing
supect things with your insertions and relaxation in a
randomized manner, then sometimes you may get lucky
and sometimes you won't.
Steve
Dear lammps-users
I have a strange problem - my lammps runs get stuck - almost randomly
[...]
at other runs,
fy=14.031
several of
>>>>WARNING: Dihedral problem: 1 25552 4132 3983 3984 3986
and a
>>>>ERROR on proc 1: Bond atoms 3984 3983 missing on proc 1 at step 25553
and several of
>>>>Cuda driver error 4 in call at file 'geryon/nvd_timer.h' in line 44.
then stuck.
if this happens more or less randomly, my first suspicion would
be that the GPU is not working correctly. that may be due to overheating
or a faulty GPU or memory. to make sure that this is not an issue,
i would recommend to run the cuda gpu memtest for a while.
[...]
I _expect_ the bond atoms to go missing and the dihedral problem to
crop up - and I _want_ LAMMPS to exit - when this happens - just means
the particle from the "fix dep1" has too much velocity from "fix
force1" - or has hit some of the other atoms with violence. But
sometimes lammps exits, and sometimes it does not.
if you force a floating point based software like lammps into
doing something that may create overflows, then you are
on your own. it is not feasible to have checks for NaN or
other invalid floating point operations everywhere. those
would slow down the code massively. some CPU/Compiler
combinations can generate "hardware traps" for that. the
DEC Alpha processor was able to do that and the GCC
compilers have an -ftrapv flag that is supposed to do that.
of course, you'll have to add a signal handler for that.
Simulations without the additional particle run fine - with nve/nvt/npt.
Simulations _with_ the additional particle but without the addforce
run fine (tested ~5,000 individual runs)- with nve/nvt.
***
Question 1: Is there a way to explicitly ask lammps to quit at the
first sign of trouble - _any_ warning or error
not really. see my explanations above.
Question 2: Is there a way to stop execution if particle deposition is
unsuccessful?
you have the source code, you can try and insert corresponding code.
axel.