LAMMPS considerably slower than GROMACS

Dear all,

I am doing some simulation and want to compare results from GROMACS and LAMMPS as I want to use LAMMPS for my further work. I get matching results, but LAMMPS is terribly slow as compared GROMACS.

GROMACS (version 2020.7-plumed-2.9.0)

               Core t (s)   Wall t (s)        (%)
       Time:    70525.462     5877.123     1200.0
                         1h37:57
                 (ns/day)    (hour/ns)
Performance:      441.032        0.054
Finished mdrun on rank 0 Sat Sep 21 13:02:54 2024

LAMMPS (LAMMPS (2 Aug 2023 - Update 3) + plumed 2.9)

Total wall time: 9:00:15

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 16195      | 17360      | 18403      | 507.4 | 53.59
Bond    | 3.2363     | 3.964      | 4.8633     |  22.4 |  0.01
Kspace  | 5326.3     | 6404.9     | 7540.2     | 846.5 | 19.77
Neigh   | 1661.6     | 1672.9     | 1690.8     |  23.5 |  5.16
Comm    | 3123.4     | 3236.2     | 3365.4     | 121.2 |  9.99
Output  | 46.003     | 62.51      | 241.97     | 684.4 |  0.19
Modify  | 3295.2     | 3496.1     | 3573.8     | 126.0 | 10.79
Other   |            | 156.1      |            |       |  0.48

I am using the same CPUs for both the simulations. I use PLUMED with both GROMACS and LAMMPS. LAMMPS has newer version of plumed.

I am attatching the input files that I have used.

Let me know if I am missing anything.

Thanks and regards.

speed_compare.zip (53.4 KB)

Why do you want to use LAMMPS, though?

That’s a genuine question.

If your answer is “because LAMMPS has a particular command that I need that GROMACS doesn’t”, then that’s your answer. LAMMPS is an extremely flexible simulation system, which also scales well across thousands of nodes, but its flexibility means that parts of the code that could be optimised for a little more speed are not.

If you are not sure why, then GROMACS is entirely suitable for you. GROMACS comes with much more aggressive default choices which have minimal impact on typical observables in typical biomolecular simulations. GROMACS literally has a neighbour list setting for how much energy non-conservation you are willing to tolerate, which can lead to significant artefacts under some conditions: https://pubs.acs.org/doi/10.1021/acs.jctc.3c00777

GROMACS also makes other choices to optimise speed that you can replicate in LAMMPS. For example, it tries to tune the short/long-range tradeoff in electrostatics to increase speed – you can replicate that in LAMMPS with fix tune/kspace. GROMACS is also compiled with mixed precision by default – you can recompile LAMMPS to use single-precision FFTs and get that speedup, or use the INTEL package which also uses mixed precision by default (or use pair styles from the OPT package that have been optimised for speed).

It is also impossible for us to know from your very limited information whether you are using LAMMPS optimally. For example, you are seeing 16% of your run time on LAMMPS used on neighbouring and comms – that’s not typical for a 12-core run, so it could be that, for example, you are submitting a LAMMPS job on a cluster that does not efficiently allocate blocks of cores. But those matters are for you to explore with your local machine admin.

3 Likes

Yes. It depends a bit on how you compile either code (e.g. whether you compile Gromacs in single precision mode or whether you compile LAMMPS with the Intel compilers and use the INTEL package). But standard compute kernels in Gromacs are much faster than those in LAMMPS.
They also cut a lot of corners and optimize more aggressively than LAMMPS.

There is crucial information missing in what you quote to make any specific assessment of where you may be using LAMMPS in an inefficient fashion. I don’t have the time to dig through your files and collect this all by myself.

There is little that can be done to change that without giving up what makes LAMMPS flexible.
If Gromacs can run your problems, then just use it and be happy and enjoy the simulation speed.
That is why we have different MD codes that focus on different applications and purposes.

2 Likes

To illustrate, here is a summary auto-generated from the sources and the documentation:

Parsed style names w/o suffixes from C++ tree in ../src:
   Angle styles:      27    Atom styles:       30
   Body styles:        3    Bond styles:       26
   Command styles:    52    Compute styles:   174
   Dihedral styles:   18    Dump styles:       34
   Fix styles:       273    Improper styles:   14
   Integrate styles:   4    Kspace styles:     21
   Minimize styles:    9    Pair styles:      294
   Reader styles:      4    Region styles:      9
----------------------------------------------------
Total number of styles (including suffixes): 1858
Total number of style index entries: 1319
Found 92 packages
1 Like

Hi @srtee, Thank you so much for such a detailed answer. Why LAMMPS over GROMACS is indeed a very genuine question. The answer is exactly the next line you write. In the end, I want to use mW model of water which uses a three-body potential. I was doing this comparison as a benchmark to test the method I am using and are the units correct or not.

I will dig deeper into the specifics as mentioned by you and Axel.

Hi Axel,

Thank you so much for the reply. I will dig deeper into this.