openmpi

I’m experiencing LAMMPS hanging at the beginning of a script that reads a restart file. The output says:

Reading restart file …
restart file = 5 Sep 2014, LAMMPS = 5 Sep 2014
orthogonal box = (-1.41015 -1.39804 -1.40953) to (201.41 201.398 201.41)
4 by 2 by 3 MPI processor grid

And it just hangs.

Is there some sort of environment variable that I need to specify here?

Ben

I’m experiencing LAMMPS hanging at the beginning of a script that reads a restart file. The output says:

Reading restart file …
restart file = 5 Sep 2014, LAMMPS = 5 Sep 2014
orthogonal box = (-1.41015 -1.39804 -1.40953) to (201.41 201.398 201.41)
4 by 2 by 3 MPI processor grid

And it just hangs.

Is there some sort of environment variable that I need to specify here?

No.

I'm experiencing LAMMPS hanging at the beginning of a script that reads a
restart file. The output says:

Reading restart file ...
  restart file = 5 Sep 2014, LAMMPS = 5 Sep 2014
  orthogonal box = (-1.41015 -1.39804 -1.40953) to (201.41 201.398 201.41)
  4 by 2 by 3 MPI processor grid

And it just hangs.

Is there some sort of environment variable that I need to specify here?

to provide some more constructive help:

it is difficult to say what is going on without having any means to
reproduce it.
there are a number of things you'd have to check:
- can you run other LAMMPS inputs on the very same nodes? if not,
perhaps there is something broken with the network on one of them
(assuming, you are running across multiple nodes, that is)

- does the problem persist with a different number of processors?

- can you log into a compute node where LAMMPS is stalling and then
attach a debugger to one or more of the running processes to find out
where exactly LAMMPS is stalling. you look for the LAMMPS process id
(let's say it is 12345), then run "gdb -p 12345" and after gdb has
launched hit CTRL-C. that will drop you back to the GDB prompt, and
then you can type "where" and get a stack trace (and post it here).
you should get multiple stack traces and compare them. they should all
be stuck in the same code, but one may be off.

- if that doesn't help, can you try to build an input that quickly
generates a similar restart and then a second that will stall just as
this one upon reading the restart.

axel.

I have had a similar experience when I had some restart files become corrupted while moving them between a hard drive and local machine. I think I figured it out after realizing I could run a new input with a data file and that the new restart files worked as intended. So then it was just a matter of determining what had happened with the old restart files.

Michael