Fixing LAMMPS' thread-safety issues

It seems to me that LAMMPS is so close to being capable of being safely run from multiple threads at once. I have scoured the entire codebase for all of the common pitfalls; You know, global variables, singletons, temporary files with fixed filenames… I can tell that the maintainers have worked hard to prevent these kinds of problems from slipping in.

When I last checked, everything seemed to be perfect except for one thing:

Lammps uses non-reentrant C standard library functions.
Well… mostly just strtok. Lammps uses strtok. A lot.
I think it’s heartbreaking to see so much effort wasted over a little bit of string parsing!

I have some questions:

  1. Why is strtok so heavily used in the LAMMPS codebase? Unless it is simply a historical accident, I have a hard time seeing how such an ancient C standard library feature became so prevalent in a C++ program.

(Ironically, for decades now, C has had a reentrant variant called “strtok_s”, but LAMMPS can’t use it because—this fact might surprise some people—C++ is not C! strtok_s was never added to C++ because you are expected to be using std::string, streams and iterators instead)

  1. Will pull requests helping to correct this problem be accepted? Having thread safety is important to me, and I’d like to help.

The fix seems straightforward in general, if a bit tedious:

  • (the easy part) Introduce a small tokenizer class to replace strtok. Being a class will allow it to contain the data which normally makes strtok thread-unsafe.
  • (the tedious part) Fix every method that uses strtok, in every package, across the entire codebase.

…I mean, it’s feasible, right?

michael,

below you find some personal comments from me. i am certain steve has some opinion and insight to offer as well.

Thank you for the detailed response, Axel.

Of course. I am only looking to ensure that multiple instances of LAMMPS can be used simultaneously; i.e. that I can call lammps_open multiple times and run commands on each one without fear of them interfering with each other.

Ick, so there are singletons after all. Indeed, then, it seems the problem is larger than I thought.

I am not familiar with the issue concerning MPI communicators, so I’ll be looking more into this. It is a shame to me, as I feel that most applications that benefit from multiple threads do not actually benefit from MPI-style multiprocessing.

I am surprised by your comment about the STL! Without a doubt it is unsafe to read and write to a single shared STL container from multiple threads in most cases, but to my best understanding multiple containers per thread should be fine. To this end, I wouldn’t call them “thread-unsafe;” but rather, that they simply aren’t concurrent data structures. (this isn’t a bad thing; concurrency comes at a cost!)

Ah, thank goodness you feel this way. I only suggested the bandaid solution because I wasn’t sure what I was walking into after seeing the codebase, and was afraid that a larger-scale redesign might be a hard sell!

…and also because wide-scale refactorings are scary in a codebase without unit tests. (But the upshot there is that the current code is impossible to unit test anyways, and that factoring out a parser would give us something that can actually be tested)

So let me give you one more observation: if you want to use MPI parallelism in the LAMMPS instances, then you must not run concurrently, except when running on separate processes using split communicators, and then managing this from threads is just an additional and unnecessary complication. Check out the partition flag and the REPLICA package.

Any other application is not what LAMMPS is designed for (you can still do it), so changes to accommodate such use are welcome, but not high priority and will be looked at critically, if they require deep changes. These are the historical choices and preferences.

In short, it may be easier to to look into using MPI and not having to worry.

Axel

My Q is what is the use case for making all of LAMMPS (e.g. the strtok issue) thread-safe?

We’re already MPI parallel and OpenMP thread parallel on chunks of code (e.g. pair styles)

that are the most costly.

Steve

Hi Steve,

My use case is that I want to write an interface for using LAMMPS in Rust code.

Rust is a language with performance characteristics matching that of C++, but with a much stronger focus on safe abstractions. A core principle of the language is that it must be impossible to invoke undefined behavior from within a program that does not use the ‘unsafe’ keyword. What this means for me as a library author is that I must be able to guarantee that no possible usage of my library in safe code can cause LAMMPS to segfault. (Note that multi-threading is possible in safe Rust.)

This means I must either give up threads (by wrapping everything in a mutex), or give up safety. None of these alternatives sound very attractive to me. To be honest, I feel that solving trivially parallel tasks with threads is much more effective, reliable, and composable than the bottom-up approaches to parallelism provided by MPI and OpenMP, where I have to cross my fingers that the implementation distributes the workload evenly-well enough. And I can’t take advantage of MPI/OpenMP from the comfort of my development PC. (thanks to frequent synchronization barriers, OpenMP code will become orders of magnitude slower than serial code if one of the threads needs to compete against a compiler or web browser).

Of course. I am only looking to ensure that multiple instances of LAMMPS can be used simultaneously; i.e. that I can call lammps_open multiple times and run commands on each one without fear of them interfering with each other.

If you mean spawn a bunch of threads and have each thread call lammps_open() (library interface), then you probably

can’t do that. I haven’t thought about it. But I’m not clear why you would want to do that.

You can call lammps_open() multiple times from separate “processes” with no conflicts. And if you want any speed-up

you would want to do that on distinct physical cores (or hyperthreads). Which I think

is no different than running multiple threads where there is one thread/core.

So if you run your driver program under MPI on 16 tasks (cores of a node), and split

the MPI_Comm into single-core communicators, then each MPI_task can call lammps_open()

independently and you are running 16 copies of LAMMPS. That only requires adding like 5 lines of

code to a serial driver app. MPI will also let you oversubscribe your cores,

e.g. running with 64 MPI tasks on a 16-core node. And so you could

call lammps_open() 64 times on that node. Which I think would

be no different conceptually from running multiple threads/core.

It might even be possible to:

build LAMMPS w/out real MPI, using its STUBS lib

run a serial driver app under MPI (mpirun -np 16 serial_app …)

have the app call lammps_open_no_mpi()

now 16 independent processes each running LAMMPS independently

not sure how you would have the 16 copies of your serial app do something different

but there is probably a way

Steve