[EXTERNAL] lammps program ended with no error printed

Please submit questions to the mailing list. to answer your question, you will find a recent discussion of a problem with the same symptoms, but a more informative subject line: Run stops without error - Reax/c

Aidan

Dear Aidan,

Thank you so much for your kind advice.

there are more information ,and the in file is exactly same
in file 

before reporting any problems with the LAMMPS code not working, you should first try running the same input with the very latest patch of LAMMPS.
your LAMMPS version is two years old. a lot of improvements have been made since. thus to avoid trying to correct problems that are already corrected, check your input with the latest patch.

also, there should be more output than log files, e.g. regular stdout and strerror output. if you are running using a batch system, they may be in separate files. especially the stderr output is crucial, as that it will for certain contain output from the MPI library. no MPI parallel job just stops like what you are reporting without such output.

assuming that you are running under batch, what is the amount of time requested for this job. a quick back-of-the-envelope check indicates that your calculation was cut off after about 12hrs, a quite typical wall clock limit for batch queues. are you sure that your jobs wasn’t simply terminated because your ran out of time?

axel.

Please read the previous thread that I pointed out to you.

Dear Axel,

I take your advices : (1) update the version with LAMMPS (7 Dec 2015) and runing the same input with the correct patch. (2) update the MPI

(3)ensure that the wall clock limit isn’t existent. However, the problem of the program is emerged again.

I need some new advices. Thank you very much.

The input script is shown below:

nobody can give advice without information and you don't provide what
is *crucial* for determining the cause of the stop of your run.
i am *very* confident that there *is* additional information. there
*has* to be output printed to the console that is either printed to
the screen and that you are somehow discarding (e.g. by redirecting it
to /dev/null) or it is captured by the batch system and written to
files that you don't pay attention to or have disabled.

an application like LAMMPS doesn't just stop for no reason without any
indication of a problem. full stop.

provide the missing information, and people might help you.

axel.

is this calling script?

#/bin/bash
#PBS -N lammps
#PBS -l nodes=1:ppn=16
#PBS -q new

project_name=in.TiO2
cd $PBS_O_WORKDIR
export LD_LIBRARY_PATH=/public/program/jpeg-8-itel2013/lib:/public/program/mpi/mpich2-1.5-intel2013/lib:/public/program/gcc-4.5.1/lib64/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/public/program/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:LD_LIBRARY_PATH ulimit -s unlimited NSLOTS=`cat {PBS_NODEFILE} | wc -l`
LMP_PATH=/public/src/lammps-7Dec15/src
OPENMPI_PATH=/public/program/mpi/mpich2-1.5-intel2013/bin

OUTDIR=/tmp/$USER/$PBS_JOBID.$PBS_JOBNAME

Generate PBS Job Information

echo “--------------------- $PBS_JOBID INFORMATION ----------------------” > jobinfo.$PBS_JOBID
echo “” >> jobinfo.$PBS_JOBID

echo “ORIGINAL FILES locate :
$PBS_O_WORKDIR” >> jobinfo.$PBS_JOBID
echo “” >> jobinfo.$PBS_JOBID

echo “TEMPORARY FILES locate :
$OUTDIR” >> jobinfo.$PBS_JOBID
echo “” >> jobinfo.$PBS_JOBID

echo “PBS JOBNAME is :
$PBS_JOBNAME” >> jobinfo.$PBS_JOBID
echo “” >> jobinfo.$PBS_JOBID

echo “PBS JOB ID is :
$PBS_JOBID” >> jobinfo.$PBS_JOBID
echo “” >> jobinfo.$PBS_JOBID

echo “NUMBER of EXECUTIVE NODES is :
${NSLOTS}” >> jobinfo.$PBS_JOBID
echo “” >> jobinfo.$PBS_JOBID

mkdir -p OUTDIR cp -rf {PBS_O_WORKDIR}/* ${OUTDIR}/
cd $OUTDIR

time ${OPENMPI_PATH}/mpirun -np NSLOTS {LMP_PATH}/lmp_linux -in {project_name} > {project_name}.log

cp -rf {OUTDIR}/* {PBS_O_WORKDIR}/
rm -rf $OUTDIR

is this calling script?

that is you input *to* the batch system, what i am asking for is the
corresponding per submit output *from* the batch system.

axel.

I just find this log:

LAMMPS (7 Dec 2015)
Reading data file …
orthogonal box = (-0.317941 -0.323637 -1.00378) to (54.8101 54.3212 50.2103)
4 by 2 by 2 MPI processor grid
reading atoms …
11700 atoms
WARNING: Resetting reneighboring criteria during minimization (…/min.cpp:168)
Neighbor list info …
2 neighbor list requests
update every 1 steps, delay 0 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 12
ghost atom cutoff = 12
binsize = 6, bins = 10 10 9
Setting up cg style minimization …
Unit style: real
Memory usage per processor = 136.866 Mbytes
Step Temp E_pair E_mol TotEng Press
0 0 -1064470.9 0 -1064470.9 -149113.04
1794 0 -1202287.6 0 -1202287.6 -1350.4279
Loop time of 1734.86 on 16 procs for 1794 steps with 11700 atoms

99.7% CPU use with 16 MPI tasks x no OpenMP threads

Minimization stats:
Stopping criterion = linesearch alpha is zero
Energy initial, next-to-last, final =
-1064470.86812 -1202287.59557 -1202287.59557
Force two-norm initial, final = 11039.9 69.7653
Force max component initial, final = 178.765 28.2834
Final line search alpha, max atom move = 1.64638e-12 4.65651e-11
Iterations, force evaluations = 1794 8806

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

If that’s the log file that LAMMPS produced, then it may

be truncated (missing the last part w/ the error message) b/c the
batch job died

before the file was flushed. Batch systems should

also produce a file that has everything that

would have gone to the screen if you had run

interactively. That is where the error will likely be (at the end).

It would also be in the log file if you’d run interactively.

Steve

It is not possible for us to help you with this problem if you are unable to find the error message generated by the system. For example, under extreme conditions it is possible for the memory requirements of pair style reax/c to change very rapidly and this can result in a single process aborting in a very rough manner e.g.

if( total_hbonds >= hbonds->num_intrs ) {
fprintf(stderr,
“p%d: not enough space for hbonds! total=%d allocated=%d\n”,
system->my_rank, total_hbonds, hbonds->num_intrs );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}

As I said about a month ago, if your computer system is unable to preserve this output to stderr, or you don’t know how to find it, then we can not diagnose your problem.

Aidan

Looking at the information again, I think if you add an ampersand(&) you will get stderr in your output file, as follows:

time ${OPENMPI_PATH}/mpirun -np NSLOTS {LMP_PATH}/lmp_linux -in {project_name} >& {project_name}.log

Even when you did not do that, the standard error output is probably in another file, typically labelled with the job number.

Aidan