How to make my output files smaller AFTER generating them

Justina · January 24, 2025, 7:14pm

Is there anyone who has any ideas about how to make the size of your trajectory files smalles if you haven’t been so bright as to choose a binary output in the first place.

My hard drive is now completely full with only a small selection of these trajectories. For analysis purposes, I needed to output a great amount of timesteps for each trajectory. However, we are now talking terabytes. Luckily, the adminstrator of the supercomputer of our university is not too worried about it, but it makes it hard to visually inspect all of these trajectories (1. it doesnt fit on my pc, 2. it takes time transferring them to my pc as I mostly work from home). Of course, I do the analyses on the supercomputer. But I have learned a lot from LOOKING at the trajectories and I would want to inspect every single one of them. The problem is that they in no way fit on my computer.

In all honesty I was too afraid to use binary output files when setting up my simulations, as I didn’t know how I could handle them and whether there would be too much information loss or not.

akohlmey · January 24, 2025, 7:59pm

You can compress files with gzip or bzip2 or lza etc.
Those files can be quite a bit smaller, often about 1/3rd 1/4th the size.
You can create those directly from LAMMPS, if configured to do so.

Please check out:
https://docs.lammps.org/latest/Build_settings.html#read-or-write-compressed-files
https://docs.lammps.org/latest/Build_extras.html#compress

Justina · March 12, 2025, 3:28pm

For those who find this question in the future and also want to convert their files after generating them I will add another possibility.

As an alternative to zipping, you can use MDanalysis to convert your LAMMPS dump to other filetypes, like xtc or netcdf. LAMMPS dump to xtc was 8 gig → 0.4gig for me. Note that there is a tremendous amount of information loss however (It contains no charges and velocities for example. and the xyz data is significantly reduced eg. 1.2893474564 might become 1.289, although I don’t know the exact cutoff)

After loading your universe, you can use the Writer function of MDAnalysis (see below). Note that the MDAnalysis writer automatically recognizes the input/output based on the file extension you specify. Hope this helps somebody.

Write the trajectory to XTCformat

    with mda.Writer(str(output_XTC_path), n_atoms=u.atoms.n_atoms) as W:
        for ts in u.trajectory:
            W.write(u.atoms)

akohlmey · March 12, 2025, 3:35pm

FYI, LAMMPS has dump styles for both xtc and netcdf, so those formats can be produced directly as well.

Justina · March 12, 2025, 3:43pm

Yes, of course! Which is much smarter to do as well if you are planning on running your simulation and know beforehand you will have a lot of data and don’t need the exact xyz coordinates etc.

This is for the people like me, that didn’t initially do that for whatever reason. It’s way cheaper to convert the files than to rerun them. Also, now I can store a copy on my pc solely for visualization purposes, whereas I prefer the dump files for (some) analysis purposes.

Justina · March 12, 2025, 3:44pm

I guess you can rerun as well and then dump in XTC or netCDF. Didn’t think of that. Thanks.