Question regarding the number of molecules

I have a simple question regarding the number of molecules in a simulation since there is a discrepancy between the number of molecules reported by the “mol” attribute of the custom dump command (by counting the number of arrays) and the total number of atoms that form the molecule divided by the number of atoms in that molecule.
I was learning the basics of LAMMPS by following the nanosheared electrolyte tutorial here and my goal is to create a velocity profile for water molecules by using Matlab, hence why I came across this question. In this simulation, the only molecule is a TIP3P water molecule with “lj/cut/tip4p/long” pair style and “kspace_style pppm/tip4p” (molecule template - molecules are created by this command: “create_atoms 0 region rliquid mol h2omol 482793”). There are also 30 ions and wall atoms created but aren’t assigned to any molecule. The first step of the tutorial is system generation, the second is energy minimization, and the third is system equilibration. For the third step of the simulation, I decided to add “dump a all custom 1000 dumpall.lammpstrj id type mol c_peperatom x y z vx vy vz” (c_peperatom = compute peperatom all pe/atom), and then process this dump file with a Matlab script.
In Matlab, the script reads the dump file, and a mol vector is created for each timestep to store the “mol” data from the dump file at each timestep. Later on, there is “Nmolec=max(mol)” which I believe that it should return the number of water molecules (for the last timestep). In addition, the script also reads the “type” data and assigns it appropriately, and later on, there is a loop defined for all the atoms at a timestep that checks if the atom is oxygen or hydrogen, and if yes, it would increment the counter by one (basically it counts the number of O and H atoms at each timestep). I also believe that dividing this number by 3 should give the number of water molecules (at last timestep in this example). I was expecting that these numbers would be equal to each other, but there is a discrepancy between them (the first one is “Nmolec=647” and the second one is “waternumber=399”).
I would appreciate it if anyone would explain why my assumption is incorrect and why these two numbers are different. I have uploaded the folder of the simulations here, and this folder is for the third step (contains the LAMMPS input file, dump file, simplified Matlab script and saved workspace). LAMMPS version is 28 Mar 2023.

You are basically asking to help you debug a matlab script entirely imagining how it computes the Nmolec and waternumber variables. I believe Prof. Charles Francis Xavier is the only one who can help!

Could you post the header of the dump you pass to MATLAB?

That’s not correct. LAMMPS will assign a molecule index for each ion, because the same atom_style is shared by all the atoms.

As a side note: if you want to make life easier for people who may have a marginal interest in helping out, you should formulate your problem in a concise, logical way, and share the relevant information. Also, please make the effort to read the forum guidelines.

2 Likes

Thanks for the quick reply pointing out my mistakes. I thought overexplaining would be better than understating. I will try to do better.

I don’t think the script needs debugging as it is functioning as it should. I was explaining how it works, but now I understand that it is kind of irrelevant to my question. Maybe if I rephrase, I could explain better:


I have a simulation box consisting of water molecules, Na+ and Cl- ions, and wall atoms. The custom dump command attributes are

id type mol c_peperatom x y z vx vy vz

and I’m specifically looking at the “type” and “mol” values. At any timestep outputted in this dump file:

  • since the “type” of Oxygen and Hydrogen is known, the total number of O and H atoms are counted. Dividing this number by 3 should be equal to the number of water molecules.

  • since there is only one molecule defined (which is water), the “mol” section would be equal to “molecule ID” whenever the type of atom is O or H. For ions and wall atoms, it would be equal to 0 (see picture below). If I assign this data to an array and then get the maximum element of this array, I believe this should also be equal to the number of molecules (because the maximum element would be equal to the highest ID of this molecule).

So the first question is whether any of these two methods to count the number of molecules is incorrect. If not, then the next question would be what is the reason for this difference. Something that I completely forgot about is that some of the water molecules are randomly deleted in the system generation step by this command

delete_atoms random fraction 0.15 yes H2O NULL 482793 mol yes

and then there isn’t any reset_atoms command later, therefore maybe this is the reason?(then the difference between the numbers is about 0.38, much higher than 0.15)


If I understand this correctly, you are correct, what I meant was that in my case they are assigned to molecule ID = 0 and have no effect on the calculations that I’m trying to do.

Would this pastebin be sufficient?

The tutorial you are referring to shows how to create velocity profiles (and density profiles) using facilities directly implemented into LAMMPS. So why are trying to do something that is effectively redundant?

Details matter when doing science and you are talking about TIP3P water while the commands you are quoting refer to TIP4P water. TIP3P != TIP4P. If you make typos like this here, it is quite conceivable that you are making the same kind of typos and thus mistakes in your MATLAB script.

These kinds of statements are very dangerous. As a long-time follower of forums and mailing lists like this, I can assure you that people making this kind of claim of their scripts and programs have been proven wrong a surprising number of times. In fact, being overly confident in one’s own programming skills is an indication of people that have not yet reached the status of an experienced programmer. I recall being like that during my time as a grad student and having been shown time and again that the mistake was on my side despite all my confidence in my skills and having being extremely careful and thorough in testing.

Why not just count the number of water oxygens?

Not in general. molecule IDs are just some numbers attached to atoms that can have any value. There is nothing in LAMMPS that guarantees that those are consecutive or that those represent individual molecules. This is something, that you control by the way how you set up your system, so it is dependent on that part of your simulation. You will have to check for consecutive molecule IDs on your own before using this property in this way to count molecules. LAMMPS only does this kind of check occasionally for atom IDs, since some features in LAMMPS depend on consecutive atom IDs. There is nothing equivalent for molecule IDs.

1 Like

The last step utilizes “compute chunk/atom bin/1d” and “fix ave/chunk” to extract the density/mass data. I don’t completely understand how these two work yet and tried to do it in another way that I’m more familiar with. Technically, I believe drawing a simple density profile should be possible from the dump data.

I understand your point, and I read a lot about it here. I believe the author tried to do the first way to implement TIP4P, and I’m exactly following the tutorial as well. I’m naive but I think there are some issues in this tutorial that are overlooked or left without explanation. For example, despite using the TIP4P/2005 model where the LJ sigma of HH = 1.0, the author has it equaled to 0:

pair_coeff 2 2 0.0 0.0 # water

there are also these warnings that I have yet to put time into reading about and try to fix later.

WARNING: Communication cutoff 0 is shorter than a bond length based estimate of 3.4358. This may lead to errors. (src/comm.cpp:723)
WARNING: Increasing communication cutoff to 15.1118 for TIP4P pair style (src/KSPACE/pair_lj_cut_tip4p_long.cpp:484)

Apologies as I can’t tell yet if issues like these are caused because the author wanted the simulation to be this way or its because of my failure to copy-paste inputs correctly.

I understand this point too. The complete script is much more complex and tries to do certain calculations with various variables extracted from the dump file. I’ve been dealing with it for over a month and there is still a long way to iron all the bugs out. By that phrase, I meant that I didn’t want to take anybody’s time to find/fix any problems in that script, and instead, I’m trying to ask if my assumption/method to count the molecules is correct or not.

No specific reason, this is also possible and would yield the same result.

I didn’t know that! I tried to circumvent it by running the system generation step on 1 processor / 1 thread, but now I understand I was completely wrong. This means that the second method is completely wrong for my case. Thank you very much for your time and attention.

Where have you ever seen a LJ sigma of HH equal to 1 for TIP4P/2005 model?

1 Like

https://docs.lammps.org/Howto_tip4p.html

1 Like

Please have a look at the functional form of the LJ potential! The choice of sigma is arbitrary when epsilon is zero and the interaction turned off. However, for technical reasons it is preferable to use a non-zero sigma in LAMMPS to avoid a (pointless) division by zero for certain settings.

The first warning is irrelevant because of the second one and that one is common for TIP4P systems, if the TIP4P pair style has the longest cutoff. The reason for the first must lie in the order of commands in the input. It is irrelevant when the final state of LAMMPS is reasonable when the calculations start with either a “minimize” of a “run” command.

Then you should not distract people by asking off-topic questions, but rather ask these specific questions right away. As a general rule, if you are asking about file formats, LAMMPS’ behavior, LAMMPS command syntax you are on-topic, when you ask about how you process that information, then not anymore.

2 Likes

Good to know. I will update that part of the tutorial.