[lammps-users] Problem reading very large (billions of atoms) data files

Hi.
One of our users is having a problem reading in an ascii file
containing ~7 billion atoms using the read_data command. The code dies
with the message:

"Invalid atom ID in Atoms section of data file".

The problem seems to be that the atom tag is getting larger than a 4
byte integer can hold resulting in the stored tag value wrapping to a
large negative, triggering the error. Setting all of the tags to zero
also fails as the error tests on <= 0. There doesn't appear to be a
way of switching off the tag enabling (and if I hack the code to let
it do that it then complains during the velocity read as it has to set
a map).

This limits us to ~2.1 billion atoms for the read_data command.
There's no such limit when using read_restart or create_atoms, but
unfortunately we can't use that here. To quote the user as to why he's
trying this:

test3.dat (8.69 KB)

rtst (866 Bytes)

ta_potential (645 KB)

Here's what should be possible, though I don't often
do testing with systems with > 2 billions atoms, so
it's possible something is broken.

You ran an intitial simulation with 7 billion atoms.
To do this you must have have no atom IDs. E.g the
create_atoms or replicate commands will disable
IDs if you exceed 2 billion. When this happens
you should check that all IDs are 0. Create_atoms does
this, but it looks like replicate fails to change to
reset the existing tags to 0 when you create
a system > 2B. This may be your problem. See
the lines of code to add below.
You can check this by dumping IDs from
the initial simuation.

The initial simulation can write a restart file.

Restart2data should convert this to a data
file, with all atom IDs = 0.

Read_data can read a data file with more than 2B atoms
if (and only if) all the IDs are 0.

So you should be good to continue the run.
Of course you could have also re-read the restart
file, but as you say, you need a data file to
change the boundary conditions. Could think about
ways to overcome this issue.

Moving to 64-bit atom IDs is a longer term solution,
which would also enable molecular systems with
more than 2B atoms. But that has additional overheads
for the majority of < 2B systems, so we haven't
bit that bullet yet.

Steve

Hi Steve.
    Does the 2 billion limit to switching off tags apply to the global
number of atom or the local number? We're running on several thousand
PEs so we only have a few million atoms per PE.

In either case, we can get a data file with 0 for all the atom tags,
but it then fails to read in the atoms due to the zero tags (line 553
of atom_vec_atomic.cpp, called from ReadData::atoms) or if I remove
this test it fails to read in the velocities since it tries to create
a map without tags ("ERROR: Cannot create an atom map unless atoms
have IDs", line 414 of atom.cpp called from ReadData::velocities).

Is there any way of getting it to read in the velocities without the
need to create a map?

Thanks.
Duncan

The 2B limit is on the number of global atoms. Basically
you cannot use atom IDs at all with more than 2^31 atoms =
2B, since they can't be stored in 32-bit ints.

So you can read more atoms than 2B from a data file
in the Atoms section if all IDs = 0, as I said. But you
cannot define velocities for them in the data file, as
you indicate, since that requires them to have IDs.
Would be the same if you had 100 atoms in the data
file with all ID=0, then tried to define velocities.

So if you really need the velocities in the data file,
you are stuck. But you probably don't. Just re-initialize
the velocities after you read the data file. That would
be fine for most simulations.

Steve