[lammps-users] problems with releasing processors after deleting the job

Hi,

I am running parallel job with Lammps in the cluster. The job was submitted using PBS script. When I deleted/canceled the jobs, the processors occupied by the job were sometimes not released. But the system showed the jobs have been deleted. It seems these processors would be always occupied. Is it the problem of the my Lammps input command? Many thanks.

Regards,

Hi,

I am running parallel job with Lammps in the cluster. The job was submitted
using PBS script. When I deleted/canceled the jobs, the processors occupied
by the job were sometimes not released. But the system showed the jobs have
been deleted. It seems these processors would be always occupied. Is it the
problem of the my Lammps input command? Many thanks.

no, this is the problem of your cluster setup or MPI launcher/library.

if you run in parallel, PBS only "knows" about the mpirun command,
but the individual lammps instances are launched "manually". particularly
with MPICH this can become a problem and you will get the kind of
"runaway" executables.

the solution to this is to have executables launched not through ssh/rsh
but through PBS (the "tm" method). with OpenMPI this is a compile time
option and with MPICH (at least the older versions) you would have to use
the "mpiexec" launcher from OSC instead of the default mpirun script.

in any case, you should talk to your system administrator and have a
proper "epilogue" script installed that will kill off all runaway tasks.

cheers,
   axel.