Syntax check of input script before simulation run

Oystein · August 31, 2021, 11:58am

Hello

I have experienced many times to run a simulation in LAMMPS but it crashes before the end because of a syntax error or something illogical in my input script. I think others possibly also have experienced this while working with LAMMPS.

The good practice method is obviously to always test new scripts or changes to scripts on very short runs to discover such mistakes before the real simulation. Sometimes, unfortunately, one can forget to do such a test or think that it is OK. It might hypothetically also be difficult to do this on certain types of simulations (?). Such errors might lead to wasted CPU hours if the error is far into the input script.

I was wondering if it is possible to build into LAMMPS an optional quick syntax/logic check of the entire script before LAMMPS actually executes the script. If there is a syntax error, the program could exit with an error statement. This will make debugging a lot quicker and might save quite a lot of CPU hours (at least worldwide, if there are others out there who have similar experiences as myself :P). Above all, it might make LAMMPS a bit more user-friendly for beginners.

Looking forward to any reply.

Kind regards,
Øystein

akohlmey · August 31, 2021, 1:17pm

This is not really possible in a general way, since syntax errors may result from processing incorrect or unexpected output from some computations.

Adding an explicit syntax check would require replicating a whole lot or code (and code that is not always very readable) which can lead to inconsistencies and other maintenance nightmares.

The best you can do with the current LAMMPS binary is to add a command like

timer timeout 0:00:00 every 1

to your input and thus avoid executing all “run” and “minimize” and similar commands.
you will still have all other commands executed that are run in between

I suggest you experiment with this and give us some feedback and perhaps we could add a command line file -check that automates that.

Oystein · August 31, 2021, 1:47pm

Thanks a lot! Will do some testing of this.

akohlmey · August 31, 2021, 2:40pm

Now that I have a proper keyboard under my fingers and few more thoughts and explanations on this topic.

It is not that the usefulness of a “syntax” checker has not occurred to people before. The hurdles to get there are quite significant.

the LAMMPS input file language is not static like for some other codes where the file is parsed as a whole at the beginning. It is more like a bash script that is evaluated line by line.
on top of that you have the ability of loops and include files where you rewind the file (or open a different file) and start reading until you reach a marker and then continue from there. Now should you flag a syntax error on those skipped lines? it won’t cause an error when running LAMMPS for real
the next, and most major, hurdle is that parsing and syntax checking is done in each source file individually. so for a chance to have a meaningful and maintainable syntax checker, one would have to change this into having a generic argument parser class that reads arguments and allowed argument types from a descriptive string/table/struct. those descriptions could be “harvested” and processed for a syntax checker. however, that will also either incur some difficulties or require not backward compatible changes to the grammar, as currently some times the syntax changes based on keywords read first and you have positional arguments and required or optional keyword / value(s) tuples.
an alternative approach would be to write a “linter” tool that won’t recognize the entire syntax, but could handle some common mistakes, e.g. check on group/compute/fix/dump ids and general common syntax requirements as well as quoting rules etc. such a tool could also be educated about alternatives or print warnings when something is used that is rather ineffective (e.g. I often see people using many instances of compute reduce where a single one using multiple arguments would suffice and thus have much less negative parallel performance impact since each compute reduce requires a collective communication, same goes for global accesses of per-atom data and much more)

at any rate anything beyond the simple hack I suggested would cause a lot of work and would take away time from working on features that enhance LAMMPS’ capabilities to do exciting new research.
I tend to look at this from the “survival of the fittest” perspective: yes, losing time due to typos and avoidable errors is painful, but a) pain induces more effective learning b) smart people will eventually figure out ways to protect themselves from making such simple and avoidable errors and - most importantly - c) I take an untimely syntax error any time over a subtle error in the implementation or the physics of the model I am using. those are far more deceptive and “dangerous” and - in my opinion - are the cause for even more wasted resources (not to mention bogus publications that can lead to others making the same mistakes) in the end. when maintaining a software this is weighing heavily on one’s mind, since the simplest of typos can lead to bugs that can go decades without being detected.

Oystein · September 5, 2021, 10:16am

Thanks for the informative reply!

Makes sense that more people thought about this already.
And I do understand your reasoning regarding how the resources towards developing LAMMPS should be used.

And yes, there is no doubt that developing and checking the validity of the choices regarding the physics of the model are a lot more important than untimely syntax errors, particularly because such types of errors might not come with an error statement or warning.

I did test the timer timeout 0:00:00 every 1 command. It seems to work effectively for the situations I have tested it. Of course, it will not capture all potential errors if there are if statements or jump statements in the script, as you mentioned. But for most scripts it will probably work fine. At least I have started to include it at the beginning when I want to check for potential syntax errors in my scripts. Thanks again!

akohlmey · September 5, 2021, 11:31am

The next LAMMPS patch version will have a -skiprun (or -sr) flag to do that automatically without having to modify your input. 4.2. Command-line options — LAMMPS documentation

Oystein · September 6, 2021, 6:42am

Great!