Topic of the Month: YAML support in LAMMPS

Dear LAMMPS users and developers,

High time to try start another open ended discussion. This time I would like to discuss about some recently added features in LAMMPS. Support to creating outputs in YAML format for easy reading and post-processing with python or other script languages. This originally started from discussions on adapting the example files in the LAMMPS distribution for regressions testing.

One of the challenges is to reliably extract only the thermodynamic data for creating a reference and storing and analyzing them efficiently. There is a logfile analyzer tool written in Python as part of that can do the extraction part, but it is rather fragile and not fully compatible with the needs for testing. Based on what was written in the first part of 8.3.8. Output structured data from LAMMPS — LAMMPS documentation the idea arose to add a new thermodynamic output style that does the YAML format automatically. The structure of YAML files also makes it easier to extract only the contents in YAML syntax and skip over contents that are not thermodynamic output.

Another contributing factor was the request to have the option to rename the column headers, especially for computes and fixes, so they will be more descriptive of what data they contain. Combined with some necessary refactoring to modernize the code and get rid of some complexity those changes were all implemented recently.

There are two ways to enable YAML style thermo output. a) Use thermo_style yaml where you get a fixed set of properties similar to the default output, b) Use thermo_style custom followed by thermo_modify line yaml.

Now extracting and plotting the data is extremely simple in Python when using the pyaml, pandas, and matplotlib modules.

import re, yaml
import pandas as pd
import matplotlib.pyplot as plt
# extract YAML format part from log file
docs = ""
with open("log.lammps") as f:
    for line in f:
        m ="^(keywords:.*$|data:$|---$|\.\.\.$|  - \[.*\]$)", line)
        if m: docs += + '\n'
thermo = list(yaml.load_all(docs, Loader=CSafeLoader))
# convert list of list to a pandas data file and plot
df = pd.DataFrame(data=thermo[0]['data'], columns=thermo[0]['keywords'])
fig = df.plot(x='Step', y=['E_bond', 'E_angle', 'E_dihed', 'E_impro'], ylabel='Energy in kcal/mol')


In combination with the thermo_modify colname option to rename columns, creating a plot of the thermodynamic data in high quality should be very easy. Certainly easier than with the older tools.

In addition, we also now have a dump style yaml that can import data in a similar fashion, and work on fix ave/time and other averaging fixes to support YAML format output has started.

What is most compelling to me about these feature is that this is built entirely on well supported widely used support software (pyyaml, pandas, numpy, matplotlib) and since pandas uses numpy storage underneath it is also very effective and fast for processing large amounts of data.

What do people think about this?
Do you see any other applications that can be built on top of this, or parts of LAMMPS that could benefit from interfacing with YAML format data?
Are there alternatives worth looking into?


To demonstrate the “portability” of using YAML as intermediate format, here is an example for extracting YAML data from the log in perl:

use strict;
use warnings;
use YAML::XS;

open(LOG, "log.lammps") or die("could not open log.lammps: $!");
my $file = "";
while(my $line = <LOG>) {
    if ($line =~ /^(keywords:.*$|data:$|---$|\.\.\.$|  - \[.*\]$)/) {
        $file .= $line;

# convert YAML
my $thermo = Load $file;

# convert hash references members to real arrays to simplify the following code
my @keywords = @{$thermo->{'keywords'}};
my @data = @{$thermo->{'data'}};

# print first two columns
print("$keywords[0] $keywords[1]\n");
foreach (@data) {
    print("${$_}[0]  ${$_}[1]\n");

P.S.: does anybody know something equivalent to matplotlib in perl?

Hi Axel,

this is fantastic news! Data management is a big issue and the YAML support is going to facilitate data flow in complex simulations. I guess this will also facilitate the use of semantic technology for MD simulations carried out with LAMMPS.

For a quick check, I use to pipe commands directly to GNUPLOT, eg:

open GP, '| /usr/bin/gnuplot';
syswrite(GP, "p '"$tmp_file"' w l
pause -1\n");