Huge Etotal difference between TIP4p implicit VS explicit methods from the manual script

Have you read the manual: Yes. I used the scripts from Lammps manual
MD simulation experience: Yes, several years, with Gromacs, Amber and OpenMM
Lammps experience: No. Started recently
Lammps version: 20230208
Lammps execution command: mpirun -np xxx lmp -in in.tip4p
Can you provide your input scripts and results: Yes. Please check this: Input scripts and results

Dear All:

I have been recently introduced to Lammps because of its good modularity and robustness. I have been conducting the simulation of water with different water models/force fields for my project.

I want to first get my hands on Lammps by conducting a TIP4p simulation with the scripts already provided by the Lammps manual:

https://docs.lammps.org/Howto_tip4p.html

There I found two different flavors to conduct it, the first one is “implicit” (and will be quoted as implicit from now on), and the second one is “explicit” (and will be quoted as explicit from now on).

I (almost) exactly copied all the scripts from the manual, with only a few modifications that I anticipate should be very safe to do. They are listed below:

  1. The box size and the number of molecules. The manual script uses a smaller simulation size with 33 waters and a box with (10A, 10A, 10A) size to maintain a ~1g/mL density. I modified it to be 216 waters and the box with (18.68A, 18.68A, 18.68A) to maintain the same density.
  2. The t_damp for the fix nvt, from 100 fs used by the manual, to 500 fs.
  3. The logfile writing frequency of SHAKE. Manual does it every 10000 steps, I just let it not print to the file at all.
  4. The total steps. I just let it run longer to be 2ns so I can make the first 1ns to be equilibration run (which is ample because liquid water can be equilibrated with ~100ps) and discard later.
  5. The thermo_style to output time as well.

I have also attached my input scripts and the molecular geometry files. I also preserved the original lines from the scripts in comments for easy check.

As such, I conducted the simulation. Meanwhile, I have conducted another simulation on Gromacs (and later in OpenMM) with the same setup as much as possible in order to have some reference to compare with: same waters, same force field, same t_damp, same box, same energy output frequency, etc. All the common cares for an NVT MD were done as well, e.g. equilibration.

After I got the results from Lammps for both methods, I found some “issues”, and I further compared them with the Gromacs/OpenMM results together with the literature result from TIP4p, and found more “issues”. I would like to summarize them as below:

  1. In Lammps, the Etotal from implicit and explicit differs hugely: the implicit has -2000 scale, while the explicit has -80000 scale.
  2. Between Lammps and Gromacs/OpenMM, after proper unit conversions, the implicit agrees with Gromacs/OpenMM in the scale at least, but the explicit disagrees hugely with Gromacs/OpenMM.
  3. The isochoric heat capacity, C_V, calculated from the Etotal by the fluctuation formalism, \frac{\langle E^2\rangle-\langle E\rangle^2}{k_BT^2} are both hugely different from the Gromacs/OpenMM result:
    Implicit: 180683.5\ \mathrm{J\ K^{-1} mol^{-1}}
    Explicit: 170938.7\ \mathrm{J\ K^{-1} mol^{-1}}
    Gromacs: 86.5\ \mathrm{J\ K^{-1} mol^{-1}}
    OpenMM: 84.2\ \mathrm{J\ K^{-1} mol^{-1}}
    Literature: 83.7\ \mathrm{J\ K^{-1} mol^{-1}}
    (Literature is Phys. Chem. Chem. Phys. , 2011,13 , 19663-19688. It is C_P instead but one will expect C_V is only slightly smaller than it)

So my questions are below:

  1. Is there anything I did fundamentally wrong on the input script to cause such a discrepancy between the Lammps implicit and explicit? Or between the Lammps and Gromacs/OpenMM?
  2. The two values from Lammps actually agrees with each other. Then what do you think causes the discrepancy between Lammps and Gromacs/OpenMM? Are they expected?
  3. I understand that C_V is an extensive property, so one needs to divide the number of molecules in order to get the numbers shown above (and they are actually “molar” isochoric heat capacity). I did divide 216 in order to get all the results above. However, some threads suggest that Lammps does NOT really “see” molecules, but atoms instead. I am thus wondering if both of those weird numbers from Lammps are caused by not finding the correct number to divide by (or say normalization)?
  4. If it is only caused by using the wrong number 216 to do the normalization, then we should expect the MD and the trajectory are correct from Lammps. Then what should be the correct number to divide for the Lammps result in order to calculate the extensive thermodynamic properties? How can I identify such a number in Lammps?

Any suggestions or comments are highly appreciated. Thank you so much in advance and I’m looking forward to hearing back from you.

Sincerely,
Hanbo

Hi Hanbo,

Thank you for your question and for the detailed background information you provided. I could not open the Google Drive link you attached due to user permissions. I recommend that you put the input text files into a post on this thread. If you “fence off” the script using backticks

```
like this
```

the forum software will automatically format it as code (using monospaced font and scrollbars, for example).

In the meantime, you should check:

Are the energy fluctuations really that much larger in the LAMMPS runs than the GROMACS or OpenMM runs? A 2000x larger heat capacity corresponds to a 45x larger standard deviation of the energy.

Are the thermostat algorithms either identical or equivalent between the LAMMPS and other runs? Are the same time constants used?

Similarly, what about the:

  • short-range cut-offs?
  • tail-corrections?
  • 1-2 and 1-3 exclusions?
  • long-range electrostatic accuracy?

Dear Srtee:

Thanks a lot for the quick response. And sorry for not taking care of the file access. I believe the sharing is now working properly.

I can still copy my input scripts and paste as below. I will respond to your questions in the next post.

For implicit:

units real
atom_style full

# region box block -5 5 -5 5 -5 5
# I modified the box size. The original one from Lammps manual is perserved above
region box block 0 18.6824 0 18.6824 0 18.6824

create_box 2 box bond/types 1 angle/types 1 &
            extra/bond/per/atom 2 extra/angle/per/atom 1 extra/special/per/atom 2

mass 1 15.9994
mass 2 1.008

pair_style lj/cut/tip4p/cut 1 2 1 1 0.15 8.0
pair_coeff 1 1 0.1550 3.1536
pair_coeff 2 2 0.0    1.0

bond_style zero
bond_coeff 1 0.9574

angle_style zero
angle_coeff 1 104.52

molecule water tip3p.mol  # this uses the TIP3P geometry

# create_atoms 0 random 33 34564 NULL mol water 25367 overlap 1.33
# I modified the number of created water molecules from 33 to 216. The original line from manual is perserved as above
create_atoms 0 random 216 34564 NULL mol water 25367 overlap 1.33

# must change charges for TIP4P
set type 1 charge -1.040
set type 2 charge  0.520

# fix rigid all shake 0.001 10 10000 b 1 a 1
# I modified the reporting frequency of SHAKE to 0, to maintain a clean format. Original line from manual is perserved as above
fix rigid all shake 0.001 10 0 b 1 a 1
minimize 0.0 0.0 1000 10000

reset_timestep 0
timestep 1.0
velocity all create 300.0 5463576

# fix integrate all nvt temp 300 300 100.0
# I modified the t_damp from 100 to 500 to be consistent with Gromacs. The original line from manual is perserved as above
fix integrate all nvt temp 300 300 500.0

# I just add 1 more entry to the report, the time, based on the original script from the manual
thermo_style custom step time temp press etotal pe

thermo 1000

# I modified the number of steps to be for 2ns. Original line from manual does 1 ns
run 2000000
write_data tip4p-implicit.data nocoeff

For the explicit:

units real
atom_style charge
atom_modify map array

# region box block -5 5 -5 5 -5 5
# I modified the box size to accomodate 216 molecules with the same, ~1g/mL density. The original line from manual is perserved as above
region box block 0 18.6824 0 18.6824 0 18.6824
create_box 3 box

mass 1 15.9994
mass 2 1.008
mass 3 1.0e-100

pair_style lj/cut/coul/cut 8.0
pair_coeff 1 1 0.1550 3.1536
pair_coeff 2 2 0.0    1.0
pair_coeff 3 3 0.0    1.0

fix mol all property/atom mol
molecule water tip4p.mol

# create_atoms 0 random 33 34564 NULL mol water 25367 overlap 1.33
# I modified the number of water molecules generated from 33 to 216, to be consistent with the other two simulations
# Original line from the manual is perserved as above
create_atoms 0 random 216 34564 NULL mol water 25367 overlap 1.33

timestep 0.5

# fix integrate all rigid/nvt/small molecule temp 300.0 300.0 100.0
# I modified the t_damp to be 500fs, to be consistent with the other two simulations. The original line from manual is perserved as above
fix integrate all rigid/nvt/small molecule temp 300.0 300.0 500.0
velocity all create 300.0 5463576

# I just added 1 more output entry, the time, to the original script from the manual
thermo_style custom step time temp press etotal density pe ke

# I modified the reporting frequency to make it report every 1ps, to be consistent with Gromacs and Implicit method
thermo 2000

# I modified the total steps to make it 2ns long
run 4000000
write_data tip4p-explicit.data nocoeff

As mentioned in the main post, almost everything was copied from the scripts provided by the Lammps manual. I only modified a few of them which I anticipated to be very safe to do so. I also preserved the original ones for you to check.
The tip3p.mol and tip4p.mol in these input scripts are also completely copied from the manual.

Are the energy fluctuations really that much larger in the LAMMPS runs than the GMX/OpenMM runs?

Yes. I used both my own Python code and/or the GMX built-in energy functionality to verify that, Without dividing the nmol, the GMX/OpenMM has a RMSD value for Etotal about 120 kJ/mol, while LAMMPS has that about 5400 kJ/mol.

Are the thermostat algorithms either identical or equivalent between the LAMMPS and other runs? Are the same time constants used?

Yes, I keep that in mind since only by ensuring such can I call them a fair “comparison”. I controlled all the thermostat algorithms as identical/equivalent as possible based on my knowledge. E.g.:

  1. The fix nvt command in LAMMPS, as far as my understanding goes, does the Nose-Hoover thermostat by default. In Gromacs/OpenMM, I set the same thermostat as tcoupl=nose-hoover, or use nosehooverintegrator in OpenMM.
  2. The fix shake command in implicit, is by default applied to Gromacs, and also turned on in OpenMM by constraints=allbonds
  3. The t_damp=500 in LAMMPS, as far as my understanding goes, corresponds to the coupling time, and is thus set consistent in Gromacs by tau_t=0.5 (in ps). It is also converted properly in OpenMM.

short-range cut-offs?

Similarly, I tried my best based on my knowledge to ensure they are the same: in LAMMPS, e.g. in the above scripts, pair_style lj/cut/tip4p/cut 1 2 1 1 0.15 8.0 or pair_style lj/cut/coul/cut 8.0 indicate the cut-off distance is 8A for both. I thus set the consistent value in Gromacs as rcoulomb=0.8, rvdw=0.8 (in nm) etc. Same for OpenMM.

tail-corrections?

As far as my knowledge goes, I don’t see this option explicitly covered in Gromacs/OpenMM MD input parameter file.

1-2 and 1-3 exclusions?

If you are talking about the proper definition of the force field, I anticipate the answer would be:
Yes for Gromacs/OpenMM, because I have already conducted many simulations on TIP4p and the results do match the literature. I have also checked their force field files and cannot find any abnormality;
“Yes(?)” for LAMMPS, unless the scripts I copied from the manual itself incorrectly define them. As far as my understanding goes, I don’t see any obvious issues from them either.

long-range electrostatic accuracy?

Yes for Gromacs/OpenMM, because of the same reason mentioned above.
“Yes(?)” for LAMMPS, unless the scripts from the manual itself have issues, as mentioned above.

Now that I have looked at the output logs, it sounds like you are right. Having said that, I think this makes sense for a simulation with TIP4P using only short-range electrostatics, namely the lj/cut/tip4p/cut style, since the Coulombic interaction is significant even at 8 angstroms and thus an abrupt cutoff would lead to significant energy fluctuations. If you want to use long-range electrostatics (which would be highly recommended), you would be using a combination of

pair_style lj/cut/tip4p/long ...
kspace_style pppm/tip4p ...

With the total energy fluctuating so little in your GROMACS and OpenMM runs, it seems to me that you must be enabling either long-range electrostatics or, in GROMACS, some kind of reaction-field compensation. You may like to share your GROMACS inputs in a post here as well – I have some years’ experience with GROMACS and will be able to read your .mdp file.

The default GROMACS Nose-Hoover thermostat appears to use a 10(!!)-thermostat chain, whereas the LAMMPS default is 3. But I doubt that is the significant source of your divergent results.

In GROMACS this would be the DispCorr keyword in the .mdp file. It should not affect dynamics with an NVT integrator, but it will affect the reported total energy (I can’t remember if it fluctuates – probably not).

In GROMACS, you should find a block like this in your molecular topology (.itp or .top):

[ moleculetype ]
; molname   nrexcl
SOL         2

This indicates that 1-2 and 1-3 bonded neighbours do not contribute LJ or Coulombic forces. Since LAMMPS’s special_bonds default is also 0 for both 1-2 and 1-3 bonded neighbours, if your GROMACS molecular topology has the above block, then the same settings would apply.

Right now my guess is that you have long-range electrostatics in your GROMACS and OpenMM simulations but not in the LAMMPS simulations – in GROMACS this would be indicated by an .mdp line of

coulombtype PME

There is nothing fundamentally wrong. But the implicit example has bonds and that causes the intramolecular non-bonded interactions to be excluded (see the special_bonds command — LAMMPS documentation), while in the explicit example they are included. The intramolecular forces are removed due to using fix rigid, but if you care about comparable energies, you can easily remove those by adding a neigh_modify command — LAMMPS documentation :

neigh_modify exclude molecule/intra all

Please note that the scripts you were copying are not examples for best practices of production simulations, but rather a demonstration for quickly setting up a water box with minimal equilibration for the different water models used. The fact that those (deliberately) use cutoff coulomb with a short cutoff (8.0 Angstrom for speed) should be an indication that those inputs need to be improved for production level simulations.

Thank you again Srtee for the very detailed answer. Combining your response with Akohlmey’s, I feel like the solution is clearer to me now but I would still like to share the .mdp file to nail down every possible issue completely.

Below is the .mdp file I used for my production run:

; RUN CONTROL PARAMETERS
integrator               = md
; Start time and timestep in ps
tinit                    = 0
dt                       = 0.001
nsteps                   = 1000000
; For exact run continuation or redoing part of a run
init_step                = 0
; mode for center of mass motion removal
comm-mode                = Linear
; number of steps for center of mass motion removal
nstcomm                  = 1
; group(s) for center of mass motion removal
comm-grps                = 

; OUTPUT CONTROL OPTIONS
; Output frequency for coords (x), velocities (v) and forces (f)
nstxout                  = 0
nstvout                  = 0
nstfout                  = 0
nstxout-compressed     = 0
compressed-x-precision           = 1000
; Output frequency for energies to log file and energy file
nstlog                   = 0
nstcalcenergy		= 1000
nstenergy                = 1000
; Output frequency and precision for xtc file
nstxtcout                = 

; NEIGHBORSEARCHING PARAMETERS
cutoff-scheme = Verlet
; nblist update frequency
nstlist                  = 10
; ns algorithm (simple or grid)
ns_type                  = grid
; Periodic boundary conditions: xyz (default), no (vacuum)
; or full (infinite systems only)
pbc                      = xyz
; nblist cut-off        
rlist                    = 0.8

; OPTIONS FOR ELECTROSTATICS AND VDW
; Method for doing electrostatics
coulombtype              = PME
rcoulomb                 = 0.8
; Method for doing Van der Waals
vdw-type                 = Cut-Off
; cut-off lengths       
rvdw                     = 0.8
; FFT grid size, when a value is 0 fourierspacing will be used
;fourier_nx               = 32
;fourier_ny               = 32
;fourier_nz               = 32
fourier_spacing           = 0.12
; EWALD/PME/PPPM parameters
pme_order                = 4
ewald_rtol               = 1e-04
DispCorr = EnerPres

; Temperature Coupling
tcoupl                   = nose-hoover
tc_grps                  = System
tau_t                    = 0.5 
ref_t                    = 300

; Pressure Coupling      
pcoupl                   = no
pcoupltype               = isotropic
tau_p                    = 0.5
ref_p                    = 1.0
compressibility          = 4.5e-5

; GENERATE VELOCITIES FOR STARTUP RUN
; note I already equilibrated my simulation prior to this .mdp file.
gen_vel                  = no 
gen_temp                 = 300
gen_seed                 = 31415

And below is the .itp file for TIP4p. It is included and pre-defined in the GMX package within multiple .ff folders. It’s completely copy-and-paste.

;
; Note the strange order of atoms to make it faster in gromacs.
;
[ moleculetype ]
; molname	nrexcl
SOL		2

[ atoms ]
; id	at type	res nr 	residu name	at name	cg nr	charge
1       OWT4            1       SOL      OW     1       0.0    15.9994
2       H               1       SOL     HW1     1       0.52    1.008
3       H               1       SOL     HW2     1       0.52    1.008
4       IW              1       SOL      MW     1      -1.04    0.0

#ifndef FLEXIBLE
[ settles ]
; OW    funct   doh        dhh
1       1       0.09572    0.15139
#else
[ bonds ]
; i	j	funct	length	force.c.
1	2	1	0.09572	502416.0 0.09572	502416.0 
1	3	1	0.09572	502416.0 0.09572	502416.0 
	
[ angles ]
; i	j	k	funct	angle	force.c.
2	1	3	1	104.52	628.02	104.52	628.02	
#endif

[ exclusions ]
1	2	3	4
2	1	3	4
3	1	2	4
4	1	2	3

; The position of the virtual site is computed as follows:
;
;		O
;  	      
;	    	D
;	  
;	H		H
;
; const = distance (OD) / [ cos (angle(DOH)) 	* distance (OH) ]
;	  0.015 nm	/ [ cos (52.26 deg)	* 0.09572 nm	]

; Vsite pos x4 = x1 + a*(x2-x1) + b*(x3-x1)

[ virtual_sites3 ]
; Vsite from			funct	a		b
4	1	2	3	1	0.128012065	0.128012065

As a side note, thanks a lot for providing a lot of deep and fundamental insights, along with the details. My previous MD experience, although for multiple years, is more like a “general knowledge” style and the input scripts are mostly importing those that are “proven to be good”, or, from publications doing similar projects, and thus I do not have a very complete and deep understanding of every entry on the input file.

Thank you Akohlmey for the instruction! Initially, I did suspect that the implicit method excludes the intramolecular energy, while the explicit method includes it. Now I will try your command but thanks to yours & Srtee’s answer, it seems that besides merely excluding the intramolecular energy, I still need extra command or alter some parameter to deal with the undesired fluctuation, or say, to conduct a true “production run”. I will return to the post after I implemented your suggestions.

Hi Hanbo,

The GROMACS script indeed uses long-range electrostatics (or Coulombic interactions). If you add the same to your LAMMPS run you should get much more comparable results. I think the appropriate lines are

pair_style lj/cut/tip4p/long 1 2 1 1 0.15 8
kspace_style pppm/tip4p 1e-4

(the long-range accuracy is set to 1e-4 similar to the GROMACS script).

Thank you again and after implementing your suggestion, the implicit method works and I did get the comparable Etotal, with a reasonable fluctuation such that the C_V I got from it is reasonable, too: 84.7 with the same unit shown above.

Due to my practical need, I tried a similar revision on the explicit method:

From the original:
pair_style lj/cut/coul/cut 8.0

To the:
pair_style lj/cut/coul/long 8.0
kspace_style pppm 1.0e-4

The C_V is reasonable, too. I believe it is safe to conclude the issue from the scientific aspect at least. Thanks a lot!

1 Like

Excellent – and your choice of styles for the explicit method is also how I would have done it (and is consistent with you getting a good grasp of how things are done in LAMMPS). All the best!