It seems to me that LAMMPS is so close to being capable of being safely run from multiple threads at once. I have scoured the entire codebase for all of the common pitfalls; You know, global variables, singletons, temporary files with fixed filenames… I can tell that the maintainers have worked hard to prevent these kinds of problems from slipping in.
When I last checked, everything seemed to be perfect except for one thing:
Lammps uses non-reentrant C standard library functions.
Well… mostly just strtok. Lammps uses strtok. A lot.
I think it’s heartbreaking to see so much effort wasted over a little bit of string parsing!
I have some questions:
- Why is strtok so heavily used in the LAMMPS codebase? Unless it is simply a historical accident, I have a hard time seeing how such an ancient C standard library feature became so prevalent in a C++ program.
(Ironically, for decades now, C has had a reentrant variant called “strtok_s”, but LAMMPS can’t use it because—this fact might surprise some people—C++ is not C! strtok_s was never added to C++ because you are expected to be using std::string, streams and iterators instead)
- Will pull requests helping to correct this problem be accepted? Having thread safety is important to me, and I’d like to help.
The fix seems straightforward in general, if a bit tedious:
- (the easy part) Introduce a small tokenizer class to replace strtok. Being a class will allow it to contain the data which normally makes strtok thread-unsafe.
- (the tedious part) Fix every method that uses strtok, in every package, across the entire codebase.
…I mean, it’s feasible, right?