Utilising a single GPU to do parametric study simultaneously

h-ishii · November 16, 2023, 9:01am

Hi,

I’m currently using GPU package to accelerate a single simulation of water molecules (TIP4P/2005) successfully.
Now I want to do the multiple simulations with different set of parameters.

I’ve read through 8.1.3. Run multiple simulations from one input script section - but it seems that it just runs multiple simulations separately. So I assume that the -partition + GPU package based approach just spawns multiple processes that incidentally share a single GPU and communicate with GPU independently.

As simulations differ each other only in parameters (say temperature, initial positions, pair_coeff, etc.) and overall computation logic stays the same, I guess those simulations can be computed all at once and simultaneously, provided that they fit in the GPU memory.

So my question is: is there any way to use a single GPU to do such simulations simultaneously, as a single computation?

I would appreciate if someone answered question.

Thank you in advance,

akohlmey · November 16, 2023, 9:37am

That is correct.

Not really. Each simulation will have separate data, describe a separate state and since the most time is spent looping over data of neighboring particles or compiling the lists of neighbors, there is not much that you can gain from running multiple systems into the same simulation. You would need to have each of the different systems ignore each other, you need a different integrator for each system, and separate or split neighbor lists. That is exactly what happens with the -partition approach.

Attaching multiple processes to a GPU is not as bad as you may think. The GPU can overlap different kinds of kernel operations (e.g. transfer data and compute interactions), so attaching multiple processes to the same GPU can improve the GPU utilization. Specifically, since with the GPU package only part of the calculation is run on the GPU, you can also hide some of the latencies. It can even be a good idea to do when running the same simulation with multiple MPI ranks (typically 2-6) and attach them to the same GPU.

h-ishii · November 17, 2023, 3:55am

Hi @akohlmey, Thank you for your reply!

Not really. Each simulation will have separate data, describe a separate state and since the most time is spent looping over data of neighboring particles or compiling the lists of neighbors, there is not much that you can gain from running multiple systems into the same simulation. You would need to have each of the different systems ignore each other, you need a different integrator for each system, and separate or split neighbor lists. That is exactly what happens with the -partition approach.

Attaching multiple processes to a GPU is not as bad as you may think.

Thank you for pointing it out. I will try -partition approach first.
By the way, I’m implementing the simulation code using PyLAMMPS interface. So, I can just pass -partition options as part of cmdargs argument of PyLammps(), or just creating multiple python processes running separate PyLammps instances, right?
Also, due to the environmental restriction, we are using OpenMP for parallelisation (using OMP_NUM_THREADS env vars). I couldn’t figure out whether -partition works with OpenMP that way. Is it possible to combine -partition with OpenMP-based approach?

Thank you in advance,

akohlmey · November 17, 2023, 6:49am

That is a bad idea. The PyLammps module should not be use for production calculations. It is very fragile and slows down calculations by extracting and caching all kinds of data during a run for later use in convenience functions.

This will not work for multiple reasons.

No. It should be evident from the documentation that using the -partition flag for LAMMPS requires multiple MPI ranks. It will error out if the number of processes summed over the partitions does not match the number of total MPI processes.

h-ishii · November 20, 2023, 12:13am

Thank you for pointing it out. We use PyLammps because:

We have to determine whether GPU is available or not from env vars and call suffix / package accordingly
We want to store parameters in a separate, human-readable and machine-parsable config file (in YAML format for the time being) and track it as a dependency for experiment management tools.

It seems that LAMMPS scripting language has somewhat limited ability for such goals, so we chose Python as the language. Will the performance change if we adapt low-level lammps.lammps python module or switching to generate lammps script dynamically and run them with lmp? Or should we use C/C++ API?

I found the paper on a parametric study with LAMMPS using GPU from separate processes:

Parallel execution of a parameter sweep for molecular dynamics simulations in a hybrid GPU/CPU environment

So I guess that if we give up using PyLammps and switch to low-level lammps.lammps module or plain lammps script (generated somehow) then it must work. Right?

Oh, then I misread the documentation. Sorry for that.

akohlmey · November 20, 2023, 6:39am

LAMMPS has “getenv” style variables that can be used to query environment variables.

With some little effort, it is possible to have LAMMPS output data in structured formats. E.g. there are thermo and dump styles that can output data in YAML format, but also the existing tools like “print” or “fix print” can be used as shown in this howto: 8.3.9. Output structured data from LAMMPS — LAMMPS documentation

See above, but your assessment of the design of the LAMMPS script language is correct. It is not a replacement for a proper script language. That is why the embedding of the LAMMPS library interface into Python (and others via SWIG) exists.

There is no drastic performance difference between either choice. The main difference between PyLammps and the plain “lammps” python module is the syntax and that PyLammps tries to behave more like other (abstract) Python modules while the “lammps” python module tries to resemble the C-library interface closely.

Any option, including PyLammps, should work. The differences are questions of stability, consistency, flexibility, and ease of implementation. The plain “lammps” python module and LAMMPS input scripts (if you can make it work) are the most straightforward and the best supported and most tested options, and don’t require recompilation for any change.

With the GPU package you have to have separate processes to access the GPU, since - unlike the rest of LAMMPS - the GPU package uses global data to manage the GPUs. This will cause problems if you create multiple LAMMPS instances within the same address space and try to use them concurrently.

So, if you do a fork before creating the LAMMPS instance, you should be fine. Same goes for using the LAMMPS executable from a script instead of the LAMMPS python module.

h-ishii · November 20, 2023, 10:00am

Thank you for pointing it out. I overlooked the existence.

Sorry for my unclear description. What I need is other way around: reading parameters from stored in yaml (or toml, json, whatever) and not write to yamls.

Thanks. This confirms that our (at least the rough strategy of) approach goes the right direcrtion.

Good to hear that. I will definitely consider to use law lammps python module after prototyping is done.

Great! That is what I’ve been planning to do. More precisely, the plain is to spawn distinct python processes each communicating with different lammps instances by bulk-job runner.

Thank you again for your kind and helpful information!

akohlmey · November 20, 2023, 10:05am

That could be done with python style variables (which require the PYTHON package). The PYTHON package is the opposite of the python module as it allows to call Python from LAMMPS contrary to LAMMPS from Python. The only downside is that python style variables can only return one value, so you would need a bunch of them.

Using the lammps python module is likely the simpler approach. I only mention this for sake of completeness.