Running LAMMPS library in multi-threaded code

Romero_Ignacio · January 10, 2016, 11:12am

Hi,

I have a C++ finite element code that runs only on a multi-processor computer (and uses multi-threading, but not MPI). From this code, I am trying to perform a large number of (fairly) simple computations with LAMMPS, which is linked as a serial library (although this is not strictly relevant, the computations involve simply the calculation of the energy and forces of small atom clusters).

All the LAMMPS computations are independent from each other, and I am trying to run them in parallel, not using MPI but creating multiple threads and LAMMPS objects. I am using Intel Threading Building Blocks, but I guess the results would be the same if I used pthreads.

Currently, each thread creates a LAMMPS object, sends the atom positions using the library.h interface, runs LAMMPS, and extracts energy and forces again using the library.h interface. After that, some clean up is performed, and the control returns to the main program.

The problem I have found is that threads seem to clash, even if each of them is using a different LAMMPS object. I have traced down the problem to the instant when two (or more) threads call simultaneously the library with the command

lammps_command(lmpobj, "run 0”);

each of them with its own (different) LAMMPS object lmpobj.

When I run the code with only one thread, everything works just fine and I am wondering if there is a fundamental reason why I can not run LAMMPS in parallel as explained, or if I need to to do additional things to for using LAMMPS in the fashion described above.

By the way, each LAMMPS computation is so simple that it makes no sense to use OMP on it. Dramatic performance gains will only come from running each individual computation in a different thread.

Thank for the help,

Ignacio

akohlmey · January 10, 2016, 2:34pm

Hi,

I have a C++ finite element code that runs only on a multi-processor
computer (and uses multi-threading, but not MPI). From this code, I am
trying to perform a large number of (fairly) simple computations with
LAMMPS, which is linked as a serial library (although this is not strictly
relevant, the computations involve simply the calculation of the energy and
forces of small atom clusters).

All the LAMMPS computations are independent from each other, and I am
trying to run them in parallel, not using MPI but creating multiple threads
and LAMMPS objects. I am using Intel Threading Building Blocks, but I guess
the results would be the same if I used pthreads.

Currently, each thread creates a LAMMPS object, sends the atom positions
using the library.h interface, runs LAMMPS, and extracts energy and forces
again using the library.h interface. After that, some clean up is
performed, and the control returns to the main program.

The problem I have found is that threads seem to clash, even if each of
them is using a different LAMMPS object. I have traced down the problem to
the instant when two (or more) threads call simultaneously the library with
the command

lammps_command(lmpobj, "run 0”);

each of them with its own (different) LAMMPS object lmpobj.

When I run the code with only one thread, everything works just fine and I
am wondering if there is a fundamental reason why I can not run LAMMPS in
parallel as explained, or if I need to to do additional things to for using
LAMMPS in the fashion described above.

LAMMPS was not written to be thread-safe. while most of the computation
should be safe, if each thread manages a separate instance of the LAMMPS
object, other parts, that depend on external functions, are not thread-safe
and would require a significant effort to be adapted. one example: LAMMPS
makes extensive use of strtok() which is not re-entrant.

axel.

sjplimp · January 12, 2016, 2:24pm

If you setup your driver application to be different

processes (instead of threads), then each process

could create a different instance of LAMMPS and

run them independently.

Steve

Romero_Ignacio · January 22, 2016, 5:45pm

Axel and Steve, thanks for your help.

I have done as you suggested and everything’s working now.

I don’t get the speed-ups I was hoping for but I need to put more time into optimizing the inter-process communication and program flow. In any case, I have now LAMMPS running tens of thousands of small computations in every run of my code.

Regards

Ignacio