Identification of a mistake in Lammps data file

cantor · December 3, 2022, 4:38pm

It often occurs, to me at least, to have the program stopping immediately because something wrong is detected in a line of the data file.
I am wondering if there is a command / instructin to highlight the data file line number with the error. This would be really helpful in debugging.

I apologize for asking a question probably already answered several times

Paolo

akohlmey · December 3, 2022, 5:18pm

This is not as easy as you might think. At least not from within LAMMPS.

There are two reasons or that:

The data file format is not self-descriptive (it depends on settings in the LAMMPS input and not stored in the data file) and does not have a structure that easily allows to identify errors. Often an error happens when reading a different line than the one causing the error.
For efficiency and code design reasons, the reading is done in blocks of lines and throughout the code, so there is no easy way to identify which line in the original file triggers an error.

To help with 1. we have extended the format so that there can be “hints” added as comments in the data file, but since those are optional, the parser cannot depend on them, only print a warning if it detects an inconsistency.

cantor · December 4, 2022, 7:12am

Axel, thanks a lot for your quick answer. May you comment more on point 1? I do not understand what “hints” could be and in which form they has to be introduced in order to be helpful. An example would be of the greatest help.

Paolo

akohlmey · December 4, 2022, 9:20am

There is no indication for when a section ends or no indication for how many lines may follow (they depend implicitly on some number of types or items, but the code doing the initial reading is not aware of any of that. Some format depends on the unit style which is not required to be present in the data file at all. And so on.

Create a data file with “write_data” and look for comments.

akohlmey · December 5, 2022, 1:23am

One more thought.

It is much easier to validate the data file format (assuming you know the atom style) from a standalone program. While the LAMMPS code must function efficiently for large systems and in parallel with many MPI ranks, this is not strictly required for a standalone program, which may be unusable (i.e. too slow) for the rare case of a very large data file, but would be useful for the most common subset of data files that people use. Very large systems are often “created” by replication and merging of different subsystems.

For, example, there were some cases in the past, where it was difficult to find an error in a data file from within LAMMPS, but much easier using the parser in the VMD TopoTools plugin. Sometimes it is sufficient to have a different independent implementation to have problems detected.