I’m trying to track down a bug in my code that parses a LAMMPS dump file and I think the issue might be with the atom IDs. I would like to explain what I’m doing and if someone could let me know if I’m making a clear error that would be great.
My code parses the dump file and it is extremely important that the atom associated with ID N is always the same atom in the simulation. Currently, if I run the LAMMPS script in serial vs. with MPI my code produces different results and the only explanation I can think of is that the atom IDs are switching around mid simulation. I’m fairly certain LAMMPS does not do that but I might be wrong. Is there some other reason the atom ID associated with an atom at the start would not be the same at the end of the simulation with MPI?
(I have the atom_modify sort left to the default, but I also have my dump sorted by id to ameliorate that).
The only way to change the atom IDs is through the “reset_atoms” command. This is a property that is stored with the atoms and thus migrates with them between subdomains. So the only way to have a difference between serial and parallel execution would be when creating atoms with “create_atoms”. That would be avoided by creating the initial system and then using read_data to load it back.
So it is far more likely that there is a problem with your code, e.g. that it assumes that the atoms in each dump frame have always the same order.
Thanks for the confirmation IDs are in fact constant.
I have the dump sorted by ID and the initial configuration is always the same as generated by the lattice command which should be deterministic. The main assumption my code makes is that the atom ID assigned to an atom is the same atom ID if I run another simulation. I think this is fair since the output from lattice is always the same.
The issue is likely with my code though, I just need to understand what LAMMPS is doing as well.
Thanks!
If your LAMMPS script is small enough, you can post it here and we could look specifically at the atom ID issue. (Atom IDs should carry over between domains, but this could always be a (very unlikely!) bug, or you might have some misunderstandings about the dump formatting.)
I found that if I call write_data once and then use read_data to initialize all of my simulations with the exact same file the issue with atom ID’s is resolved. Not sure where they were getting mixed up in my previous approach though.
In the script above (which presumably does not assign atom IDs in a repeatable way) I dump equlibrium.data to save my initial structure which looked to be identical from run to run. Then I run my simulation and dump dump.atom which seems to have the same atom IDs as the equilibrium.data from the same LAMMPS run but does not in subsquent runs. Basically, I was assuming that the atom IDs I generate from this file were always be assigned to the same atoms in independent runs and that does not appear to be the case.
Note this isn’t a case of atom IDs changing during simulation (which doesn’t happen unless reset_atoms is used) but rather atom IDs being different when created, presumably on different numbers of processors.
Try reset_atoms id sort yes after your create_atoms command. As the docs say:
If the sort keyword is used with a setting of yes, then the assignment of new atom IDs will be the same no matter how many processors LAMMPS is running on. This is done by first doing a spatial sort of all the atoms into bins and sorting them within each bin. Because the set of bins is independent of the number of processors, this enables a consistent assignment of new IDs to each atom.
This can be useful to do after using the “create_atoms” command and/or “replicate” command. In general those commands do not guarantee assignment of the same atom ID to the same physical atom when LAMMPS is run on different numbers of processors. Enforcing consistent IDs can be useful for debugging or comparing output from two different runs.