LAMMPS getting "stuck" - GPU + mpi + addforce

Please keep the list in the loop.

When LAMMPS prints an "ERROR" message,
it then calls MPI_Abort(). MPI is supposed
to kill the entire job if any processor calls MPI_Abort().

If that is not happening, then something is faulty
with MPI on your system, or as Axel says, if you
are also using GPUs, perhaps they are not communicating
properly with the OS.

There is nothing else LAMMPS can do, that I know of.

Steve

In all probability, the problem is the MPI environ. Single process+GPU
works fine.
So far I have been using openmpi. I will try MPICH ..., as well as non-GPU+MPI

Thanks,
Manish
- - - - - - - - - - - - - - - - - - - - - - - - - - -

In all probability, the problem is the MPI environ. Single process+GPU
works fine.

You are discounting the fact, that running in parallel can change the
load on the GPU, particularly when oversubscribing the GPU.

So far I have been using openmpi. I will try MPICH ..., as well as non-GPU+MPI

I would be very surprised if switching the MPI library would make a
difference, since it's time consumption should be very low. Same as
the impact on lammps performance.

Axel