@srdjan125
My apologies for the delay in getting back to you. Actually, I have been focused on fixing some issues with the Gromacs parser that came to light due to your post and data. So, thank you for helping us to improve MD support in NOMAD.
Let me first address the specific issues you had with the data that you shared, and then I can try to answer some of your more general questions.
There were 2 relatively straightforward issues with parsing your data:
-
Our previous parsing of Energy quantities from the log file was insufficient for more general cases. This was the main error that you were getting in your first set of uploads. This is now fixed.
-
Some of your .trr files do not have positions stored. This was causing some problems with our parser because it previously did not look for an xtc file if a trr file existed. I have added a check to make sure that we can extract the positions from the trr file and if not we check for the xtc. In this way, we avoid the issue for your set of data.
With these fixes, I can successfully upload all your data in NOMAD. These fixes will soon be merged with the development branch. Then, it may then take a little time to get pushed up to the beta deployment of NOMAD. I will check on this and try to return to you with some sort of estimate.
Now, let me try to address your other questions:
“srdjan125: Once the issue is fixed it will not be problem to use parser for GROMACS umbrella sampling simulations with coarse-grained force field? (the one I shared with you was all-atom run).”
Yes, there should now be no problem to upload your data (once the fixes go through). Note that while there should be no problem in uploading umbrella sampling simulations, we do not yet have explicit support for enhanced sampling methods, i.e., we do not store these parameters / output in a normalized format. The Gromacs parser will by default catch the constraint energies and some other quantities though, so they will in any case be available in the archive. Depending on your intended usage of NOMAD, this may be enough. This is something I am happy to discuss with your more via a Zoom call if you are interested.
Similar answer for the CG simulations. You should have no problem uploading the data. However, we have not yet developed all the appropriate metadata for CG simulations. This is something that is pretty high on our priority list.
“srdjan125: a) Some of my runs were restarted (for example I run 400 ns simulation and after some time I decide to continue run from 400 ns to 800 ns) which means that log file will be interrupted due to appending data of new run to same file. Is this a problem for GROMACS parser or it can recognize such scenarios?”
This is something that I have not yet looked at in depth, so we will have to treat this on a case by case basis to make sure that the parser can handle these situations. I know that if you run a simulation and then prune the trajectory file (i.e., subsample using trjconv), the parser has no problem storing the data accurately, even though the number of steps in the log file and in the trajectory file are different. So, based on this I would guess that your data would still be parsed correctly, but again we would need to test this. It would be great if, after the above fixes are made, if you could try it out and let me know if it works or not. If you have problems then I can work on fixing the parser for you (as we did here).
“srdjan125: b) I noticed that upload and processing of data takes long time, and then I need to wait is it success or failure. Can I somehow tell to NOMAD to process every 20th snapshot? Or I need to do post-processing and take every 20th snapshot by myself before uploading into NOMAD?”
A couple things to note here:
The MD parsers will automatically calculate a few observables (mostly for equilibration detection purposes) like the molecular radial distribution functions and mean squared displacements. Most of the processing time for your data was actually being spent calculating these observables for the > 50k water molecules in your system. This is obviously not very useful, so I set a limit of the number of molecules for which these calculations will run. The first set of simulations that you sent now both are parser in less than 10 min.
More generally, the parser will already automatically prune your trajectory data for storage in the archive if the cumulative number of atoms (i.e., n_atoms * n_frames) is greater than some threshold (I think set to 2.5M at the moment). This is simply for efficiency of features in the GUI, but the raw data is stored unpruned.
If you would like to further prune your data, as I mentioned you can prune your trajectory file without messing up the parser. However, there is no way at the moment to provide custom pruning instructions to NOMAD. We are working on more custom approaches for uploading MD data, which are much more flexible in these terms. Preliminary support and documentation should be available by the end of the year.
“srdjan125: c) which files are the most slow for processing and which one I must include in upload folder and which ones I can skip (ie .edr .log .trr .xtc etc).”
I would advise generally to include all the raw data files from your simulation in the upload (other than situations where your trajectory is too large to store the full thing, in which case you can first prune it for the upload). I hope that with my fixes you will already find the processing times reasonable. However, if you still find long processing times, please let me know and we can discuss further.
Again, I will get back to you when I have a better idea of when the fixes will be available in the beta.
Best,
Joe