Dear LAMMPS users,
I hope you all are doing good and safe. I have just one question to ask which software is good enough to create a data file for 2.3 million polymer chains. I have a check that LAMMPS has itself put the data file software out but I was confused about which one to choose to create the data file for such large simulation.
1.Molttemplate
2.Avogadro
It might be very trivial for you guys but one answer could help me do my research faster. Thank you!
The two programs do different things.
Avogadro is used for generating coordinate files. It is not a topology builder. (And if my understanding is correct, it was designed for sketching / hand-drawing relatively small systems with less than 100 molecules. I could be wrong about this.)
Moltemplate, Topotools, and EMC create LAMMPS data files (containing geometry and topology) as well as a LAMMPS input script (containing force field parameters).
Other similar programs can be fou
http://montecarlo.sourceforge.net/emc/Welcome.html
https://nanohub.org/resources/struc2lammpsdf
https://sourceforge.net/projects/moleculardynami/
https://sgsaenger.github.io/vipster/
(Vipster is a drawing program which can build data files, but it is in the alpha stage of development)
Example (for short polymers):
If you want to create geometry files containing millions of small, short molecules mixed together, then you could use a program like Avogadro to draw one of the molecules and save its coordinates in a file, PACKMOL to pack 2.3 million of them together, and a tool like moltemplate or topotools, or EMC to build a LAMMPS data and input script files (using the coordinates created by PACKMOL).
Example (for long polymers):
If your polymers are long (compared to the size of the periodic boundary box), then you will need to generate a space filling curve (or, in your case, many space-filling curves) that does not exceed the simulation box… Then wrap the polymers along the lengths of these curves. There is a moltemplate example of a long polymer of DNA trapped in a small box here. (Picture here.) (README file here.) If you have N long polymers, one way to generate their coordinates is to create a long space-filling curve, and then cut it in (N-1) places. This will not necessarily be a realistic conformation, so you must still run a simulation to relax the conformation. (The picture in the example has not been correctly relaxed.)
Building these kinds of systems is tricky because the conformation of the polymer(s) at equilibrium will depend on the density, width, and persistence length of the polymer(s). At high packing densities, expect an (anisotropic, liquid) crystalline conformation. (example1, example2) However it can be tricky to run the simulation long enough to reach this end state. For long polymers (eg example2), you will have to use pair soft or pair lj/cut/soft or pair table (or pair lj/charmm/coul/charmm/inter), to enable the chains to pass through each other if you want to reach these crystalline equilibrium conformations. (If there is another way, I’m curious to know.)
— large systems —
I am not enough of an expert using EMC or topotools to be able to comment on how easy it is to build large systems with those tools. Moltemplate has not been optimized for these kinds of huge systems. But you can usually find tricks to avoiding the need to do this:
—Memory requirements for Moltemplate:—
Moltemplate requires between 3-12 GB of RAM to build a system containing 10^6 atoms. If each polymer has M atoms, then multiply this by M.
If it helps, you can rent “highmem” computers from amazon AWS or google compute engine for a few pennies an hour. (Moltemplate is not yet multithreaded, so there’s no reason to rent a computer with many CPU cores.) But this still might not be enough.
(Boring details: Memory usage is roughly proportional to the number of times you use the “new” command to create a copy of a molecule. If you can use the “new” command once per polymer {IE if you can represent each polymer as a single large molecule with a long list of atoms}, this will use less memory than if for every polymer you have to use the “new” command to create every monomer in the polymer.)
—Time requirements for Moltemplate:—
On my laptop, creating a system, building a system containing 10^6 atoms requires between 4-40 minutes, depending upon the number of “new” commands I am using, whether or not I am using a force-field to generate the angles, dihedrals, and improper interactions, and whether or not I am running moltemplate with the “-nocheck” argument. (Using “-nocheck” reduces the running time by about half, but you should not use it until you are certain that there are no syntax errors in your file.)
If you are considering using moltemplate, I recommend that you first create a small system and get it working with LAMMPS. Once you have got that working, it is a relatively simple matter to make the system bigger (although it might take moltemplate hours or days to run).
— Using LAMMPS’ “replicate” command —
Alternatively, you could use a tool like moltemplate or topotools or EMC to create a data file describing small portion of your system and then use LAMMPS’ “replicate” command, to make many copies of it. This might be easier (use less memory) than waiting for days for moltemplate to run out of memory (and then crash your AWS instance, or your cluster).
Both topotools and Moltemplate also have ways to read an existing LAMMPS data file and replicate it. (With moltemplate, you could use “ltemplify.py”. In Topotools, you would use “topo readlammpsdata” and “topo replicate”.) But if you are running out of memory, then I recommend using LAMMPS’ own replicate command instead.
— Force fields —
Ultimately, the tool you use may be determined by the kind of force-field that is appropriate for your polymer.
MOLTEMPLATE supports the ATB as well as OPLSAA, AMBER/GAFF/GAFF2, and COMPASS (published parameters only). EMC also supports PCFF (published parameters) and a couple others.
I hope this gets you started.
Andrew