Quitting all partitions

LAMMPS version 19 Nov 2024

Dear all,
I am running multiple partition runs. Whenever a criteria is satisfied in one of the partitions, I would like to save the data and shut down all the other partitions. I tried to use the partition command that is supposed to act on the specified partitions but the simulation continues to run after. This is my command
run 1000000 every 10 "if '$(v_disp) >= 2' then ' write_data event.data ' 'partition yes * quit'"

A similar issue was discussed here quit (conditional exit) in partition mode - #2 by sjplimp. I would like to do the exact opposite, i.e quit all partitions when the condition is met in one of them.

Update, I did find a (quite ugly) workaround which is to set a variable that halts my simulation with “error hard”, which causes all partitions to crash. I wish something more clean that does not artificially create an error could be done

That would require some C++ programming. It is not so simple to communicate with MPI between multiple partitions since they would have to “listen” to each other but by construction, they are independent.

Without looking into details, I doubt that the quit command can be set up for that since what it does currently is not very different from triggering an error. It would be possible to tell it to use MPI_Abort() on the “universe” communicator instead of the “world” communicator, but that would have the exact same result of your “trigger an error”-hack.
Instead, I would favor a modification of fix halt. That would be activated regularly during a run and in case of an global exit, it could poll for a message on the inter-world communicator that would then tell every “world” to stop their runs after the next iteration.

[Update]

@tomasfbouvier I went ahead and added such a feature to fix halt.

It works for me using the following simple test input. Partition 2 has a shorter wall time limit (10s instead of 100s) and thus fix halt will trigger for that first and it sends messages to the other partitions to stop as well:

# we have a different CPU time limit on partition 2 to trigger fix halt
variable maxtime universe 100 10 100 100 100 100 100 100 100 100
variable curtime equal cpu

units           lj
atom_style      atomic

lattice         fcc 0.8442
region          box block 0 20 0 20 0 20
create_box      1 box
create_atoms    1 box
mass            1 1.0

velocity        all create 1.44 87287 loop geom

pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5

neighbor        0.3 bin
neigh_modify    delay 0 every 20 check no

fix             1 all nve

fix             2 all halt 100 v_curtime > ${maxtime} error soft message yes universe yes

thermo          100
run             10000

Hi Axel,

Thank you so much for your answer and for implementing the solution. I will try it myself for my use case. Otherwise I guess I will proceed with the “raising an error solution”.

Hi again,

I’ve tried. It works great! Thank you very much!

1 Like