extension to parallel writing of dump files

Hello all,

Attached is a first version of some code to enable a subset of MPI ranks to be writers of dump files (instead of the current all or one capability). This is meant to help address some of the I/O scaling issues for big systems on a large number of processors. With this code, an arbitrary number of MPI ranks can be designated as writers of a dump file that will each collect data from a subset of MPI ranks. This is implemented as an additional keyword 'nfiles' for "dump_modify" and something like the following would coordinate the writing of 8 smaller dump files, which could then be recombined, if needed, as part of a post-processing/analysis step.

dump 1 all custom 10000 lmp.lammpstrj.%.bin id type x y z fx fy fz
dump_modify 1 nfiles 8

If running on 128 MPI ranks, then the commands above would result in

  the data on ranks 0 to 15 being collected and written to disk by rank 0 in the file lmp.lammpstrj.0.bin
  the data on ranks 16 to 31 being collected and written to disk by rank 16 in file lmp.lammpstrj.16.bin

In an initial benchmark, using the system from the LAMMPS Chain benchmark replicated 9x9x9 for a total of ~23.3 million particles, the Outpt time was ~21% of the total time (30K steps, dump every 10K steps for total of 3 writes) on 16384 MPI ranks with 4 threads/rank. Using 'nfiles 32', the Outpt time was reduced to ~2% of the total time. Exact numbers of course will depend on a number of details, but if you notice that Outpt is a large percent of the total runtime in your simulations, then you may find this code useful.

The implementation could probably be a little cleaner, but I think this would be a nice feature to add.


dump.cpp (24.2 KB)

dump.h (5.73 KB)