Launching multiple walkers in LAMMPS

Dear LAMMPS developer and dear users,
I have some questions about how to launch a multiple walkers simulation with the COLVAR module on a HPC cluster.

My dynamics input (nvt.in) contains the initial structure and the other fundamental settings. It can read the MTD settings contained in the configfile.inp, where I activated both the keywords well-tempered (WT) and multiple walkers as reported below.

metadynamics {

  • name meta_c*

  • colvars c_p2 c_pa1 c_pb1 omega_2 omega_3 omega_4*

  • hillWeight 0.2*

  • hillWidth 2.0*

  • newHillFrequency 500*

  • wellTempered on*

  • biasTemperature 4200*

  • multipleReplicas on*

  • replicasRegistry /lustre/project/…/folder_of_multwalk_WTMTD/ #I will write the absolute path to the simulation folder*

  • replicaUpdateFrequency 1000*

  • writePartialFreeEnergyFile on*

  • writeFreeEnergyFile yes*

  • keepFreeEnergyFiles on*

  • writeHillsTrajectory yes*
    }

Is it possible the activation of both WT and multiple walkers keyword? So, can COLVARS run a WT MTD on multiple walkers?

Secondly, in the COLVAR manual at the section “Multiple-walker metadynamics” I do not see any keyword to define the number of walkers? How can I set it?

I would launch a single job in the HPC cluster, so all the walkers as a bundle. How should I write in the slurm input? I usually launch this command

srun /lustre/project/m2_komet331hpc/emmrossi/lammpsForEmma/build/lmp -in nvt.in -l npt_out.log

I would be very grateful if you could help me and answer to my questions. Thank you very much in advance for your support.

Best regards,
Emma Rossi

@giacomo.fiorin can hopefully address your COLVARS specific questions.

As for the LAMMPS part, for multiple walkers you usually launch multiple independent simulations that just regularly share and apply the accumulated collective variable data (through a file). In LAMMPS you can “split” a single parallel calculation into multiple “partitions” which can be used for multi-replica calculations/commands like parallel tempering in the REPLICA package, but you can also use this to run independent calculations.

For more information, please see: 4.2. Command-line options — LAMMPS documentation
and: 8.1.3. Run multiple simulations from one input script — LAMMPS documentation

Dear Axel,
thank you for your suggestions. I tried lauching the command

mpirun -np 16 lmp -partition 4x4 -in nvt.in -l nvt_out.log

to run 4 walkers each using 4 processors. However, I got the following error

LAMMPS (4 Jan 2019)
Running on 4 partitions of processors
terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
[z0082:62721] *** Process received signal ***
[z0082:62721] Signal: Aborted (6)
[z0082:62721] Signal code: (-6)
[z0085:62587] *** Process received signal ***
[z0085:62587] Signal: Aborted (6)
[z0085:62587] Signal code: (-6)
[z0082:62721] [ 0] /lib64/libpthread.so.0(+0x12c20)[0x7f95ca774c20]
[z0085:62587] [ 0] [z0082:62721] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f95ca3d437f]
[z0082:62721] [ 2] /lib64/libpthread.so.0(+0x12c20)[0x7f88d832ec20]
[z0085:62587] [ 1] /lib64/libc.so.6(abort+0x127)[0x7f95ca3bedb5]
[z0082:62721] [ 3] /lib64/libc.so.6(gsignal+0x10f)[0x7f88d7f8e37f]
[z0085:62587] [ 2] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x7f95cbfbc50d]
[z0082:62721] [ 4] /lib64/libc.so.6(abort+0x127)[0x7f88d7f78db5]
[z0085:62587] [ 3] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x984d6)[0x7f95cbfba4d6]
[z0082:62721] [ 5] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x98521)[0x7f95cbfba521]
/cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x7f88d9b7650d]
[z0085:62587] [ 4] [z0082:62721] [ 6] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x98738)[0x7f95cbfba738]
[z0082:62721] [ 7] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x984d6)[0x7f88d9b744d6]
[z0085:62587] [ 5] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x98521)[0x7f88d9b74521]
[z0085:62587] [ 6] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x98c4c)[0x7f95cbfbac4c]
[z0082:62721] [ 8] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x98738)[0x7f88d9b74738]
[z0085:62587] [ 7] /cluster/easybuild/broadwell/software/compiler/GCCcore/6.3.0/lib64/libstdc++.so.6(+0x98c4c)[0x7f88d9b74c4c]
[z0085:62587] [ 8] terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
[z0076:64327] *** Process received signal ***

etc…

What does it mean? I tried also with a lower and higher number of partitions and processors, but I got the same error. Could you help me please ?

Thank you again.
Best regards,
Emma Rossi

It means, that there is some bug somewhere. Unfortunately, your executable was not compiled with debug info included (i.e. with the -g flag added) so it is not possible to get a meaningful stack trace and with that figure out from where the bug would be triggered.

In addition, this is a pretty old version of LAMMPS, so the first thing to try would be to upgrade to the latest stable or patch release version.

Hi @Emma_Rossi the well-tempered correction is applied individually to each replica/walker, i.e. the height of the hills is scaled down based on the amount of bias already applied by that replica. Those hills are then shared (after being projected onto a grid) with other walkers in the same way was the unscaled hills are. So yes, you can use well-tempered MTD and multiple walkers together.

However, my personal preference would be not to do that, and only use the multiple-walkers feature to maximize sampling. The main issue with MW is the limited diffusion of the individual walkers potentially hiding the presence of hysteresis (i.e. bad CVs). WT-MTD gradually eliminates fluctuations in the bias at long simulation times, but these fluctuations are also what helps the walkers move. I think that’s a choice to be made depending on the type of energy landscape (diffusion-limited or barrier-limited).

As for replicaID it is automatically set from MPI. The doc mentions a generic “parallel communicator” because NAMD is a little exotic and uses something else, but I’ll try to mention explicitly MPI in the doc wherever pertinent.

As for the errors you’re getting, I second what Axel suggested: please update the the latest stable version of LAMMPS before looking any deeper.

Note, importantly, one thing: the implementation of muliple-walkers MTD you are trying to use is file-based and asynchronous. So you could also submit a bundle of jobs that run concurrently but do not share the same MPI communicator. Only, in that case you’d have to specify replicaID manually for each walker.

Giacomo

Dear Giacomo,
thank you very much for your clarifications. I have launched the walkers independently and they seem to be ok.

Best regards,
Emma Rossi