Create bulk material from a single molecule data file

wswgG · March 25, 2025, 10:10pm

Hello, LAMMPS community!

I have a data file [ LAMMPS format ] with a single molecule. Is there a way in LAMMPS to build up a bulk material out of it by given density? I tried the following :

variable  MOLECULE string '<name>'
variable  NMOL equal 125
read_data  ${MOLECULE}.lmp
replicate  5 5 5
write_data  ${MOLECULE}.${NMOL}.initial.lmp

but it’s a silly way, since the resulting simulation box is freaking huge and it takes a long time for NpT integrator to shrink it to the desired size. Moreover, I want to have a desired density from the beginning, i.e. I want to put as many molecules as necessary in a simulation box of a given size.

Can I achieve my goal using standard LAMMPS command or do I have to use an external tool?

Many thanks!

akohlmey · March 25, 2025, 10:16pm

It is up to you, how large the box is. Just make it smaller.

Then don’t use fix npt but fix nvt and fix deform.

That is not likely to work. Dense systems will have very high potential energy. It is better to start with a lower density, use strong thermostatting with a dissipative thermostat and then adjust to the desired density with fix deform. Give the system enough time to equilibrate some at the desired volume/density at somewhat elevated temperature. Then cool down, hold, and switch to fix npt.

An alternative to using a data file would be to create a molecule file instead and then using create_atoms with the molecule. The principal problem that it is hard to create a dense system from scratch, especially a liquid or soft polymer, is independent of that.

wswgG · April 11, 2025, 3:30pm

Dear Axel,
thanks for your reply! Greetings from Germany, Lower Saxony

I apologe in advance for my potentially silly discussion. I’ve been using LAMMPS for CG simulations of KG polymer models, and only recently started working with all-atom models with OPLS force-field. So, currently, I’m “rediscovering” LAMMPS so to say

AFAIU, replicate command doesn’t allow for choosing the simulation box size, does it? Thus, I believe, fix deform is the way to go, following your further statement :

Regarding create_atoms and molecule commands that you recommended :

I tried this way, but haven’t found it easy. I happened to find these two discussions :

how to prepare molecule file for lammps - LAMMPS / LAMMPS Mailing List Mirror - Materials Science Community Discourse

Molecule auto special bond generation overflow - LAMMPS / LAMMPS Beginners - Materials Science Community Discourse

and also carefully looked through this tutorial :

https://lammpstutorials.github.io/sphinx/build/html/tutorials/level2/polymer-in-water.html

and understood that molecule requires as an input not a LAMMPS data file, but a molecule template file. While preparing the latter one using the existing LAMMPS data file [ generated by LigParGen ] , I faced the problem : pair interactions are required for, e.g. write_data, delete_atoms, etc., but there is no section Pair Coeffs in LAMMPS molecule template file, and thus pair coefficients need to be defined in a LAMMPS input script. For my particular problem [ 57 different atom types ] , this means at least another 57 lines in the LAMMPS input script, and personally I don’t consider this convenient, or separate LAMMPS file force-field parameters and include command.

Alternatively, I’m looking into Moltemplate [ don’t find it easy to start with too though ] and Packmol.

Generally speaking, it’s hard to believe that my task that doesn’t sound complicated at first sight, i.e. generate a bulk material model out of the initial SMILES sequence or coordinates file that is already parametrised using MoSDeF or LigParGen, causes so much trouble.

Kind regards,
Igor.

akohlmey · April 11, 2025, 4:58pm

Of course not; it replicates. So what matters for the final box dimension is the box size for the original data file.

Many people (including the moltemplate author, BTW) consider include files the better solution to embedding the parameters into the data file with coordinate and topology data. It is really easy to transfer them, since the format is effectively the same as for the respective *_coeff commands and thus easily converted with a decent text editor. If you want to be even lazier, you can load the data file and then use write_coeffs to create the file. Molecule files, especially for your use case, offer even more flexibility and you can go one step further in using the labelmap command and type labels, i.e. symbolic constants for all types instead of numbers. That makes particularly molecule files far more portable, but also include files with *_coeff commands. Please note that for general inputs, you can have more types than you actually use, only you need to provide coeffs for all types. There are some examples for typelabels in this tutorial and and an in-depth discussion in this paper (preprint).

It is a well known fact in the “business” (and for many years) that constructing initial geometries of condensed molecular systems and properly equilibrating them is a big challenge. With atoms, you just add atoms on a lattice and delete the overlap; with molecules you have larger entities and often not spherical, so a dense packaging outside of a regular crystal (and you usually don’t want that, since it is extremely hard to break this up and make the system “forget” about the initial crystal structure. This is true even for small molecules like water that form a complex hydrogen bond network). People have developed different strategies over the years and not all of them are equally well suited. In current times, where lots of people with very limited MD experience manage to get papers published with less than ideal (and sometimes plain wrong) descriptions of workflows it is even more difficult to get started.

This is also a problem, since those tools are not necessarily producing a well matched parameterization and sometimes rather poor output. Giving too much trust to automated
tools is a general problem in what is often done in MD simulations by people (usually due
to lack of experience and oversight / tutoring). Nothing parallels a proper manual atom typing by an experienced practitioner, or at least a database approach with properly typed templates, like it is done with proteins and DNA/RNA. But even then you have non-standard residues or incomplete initial data that require manual attention. I have some examples posted on my homepage demonstrating how VMD scripting can be used to apply such manual choices informed by studying and applying the instructions of the original force field publications.

Germain · April 12, 2025, 9:47am

Hi @wswgG,

For what it is worth, a trick I like is the following. Using the fact that total mass is constant one can use: \rho_1V_1=\rho_2V_2 and that for a cubic box V=l^3

variable dens equal dens
variable vol equal vol
variable newdens equal 0.87 # or whatever
variable newlen equal "(v_dens*v_vol/v_newdens)^(1./3.)"
change_box all x final 0. ${newlen} y final 0. ${newlen} z final 0. ${newlen} remap units box

This is a very aggressive way of changing a box size to a new density and as Axel mentioned it can make it difficult to get a correct configuration for polymers or viscous molecules in a condensed phase. But you can get closer to a solid state density from a very dilute configuration and equilibrate your system from there. I would still recommend to minimize the energy before running the simulation since you will have homothetical strain of the molecules which can lead to huge internal energies.

Note that this does not solve any of the issues mentioned by @akohlmey concerning force field parameters.

wswgG · April 23, 2025, 9:12am

Hello, @Germain , and thanks for your reply!

Wow, this recipe looks aggressive and radical indeed! Many thanks for it!