Breaking large dump file into smaller ones - Rationale behind using dump modify pbc yes/no

Amir_Hosein_Sadeghi · October 10, 2019, 1:12pm

Dear All,
I have some large dump files O(70GB). To reduce the size of the file, I want to write dump every some steps; in this way, each dump file will be broken into 5 smaller 15GB-size files. In this way, I can post-process the outputs faster more easily using Dask, Pandas or MDAnalysis in Python.
Since I only study equilibrium properties and do not probe the dynamic properties such as time correlations, is this strategy good?
One other point, I run some simulations and realized that dump modify pbc yes result in faster simulations for dump atom style. Is this a general trend? Moreover, are there any specific conditions for using pbc yes?
Thanks for responding these elementary questions.
Best regards,
Amir

akohlmey · October 10, 2019, 1:22pm

Dear All,
I have some large dump files O(70GB). To reduce the size of the file, I want to write dump every some steps; in this way, each dump file will be broken into 5 smaller 15GB-size files. In this way, I can post-process the outputs faster more easily using Dask, Pandas or MDAnalysis in Python.
Since I only study equilibrium properties and do not probe the dynamic properties such as time correlations, is this strategy good?

yes, this is considered best practice. it is easy to do with LAMMPS. instead of doing just one run statement, you do a loop and inside the loop you do a “run XXX pre no post no”, then “undump ”, then “dump ” with a filename containing the loop index variable, and do as many iterations of this loop as needed. if you have some process that needs to do a continuous change over the multiple chunks, you need to look at the “start” and “stop” option of the “run” command.

One other point, I run some simulations and realized that dump modify pbc yes result in faster simulations for dump atom style. Is this a general trend? Moreover, are there any specific conditions for using pbc yes?

no. on the contrary. using pbc yes should cause a slow down, since it adds a post-processing step to the dump writing. it is more likely, that the performance difference is caused by something else (e.g. different i/o load of the file server, “parasitic” processes left over from previous calculations, different load of a shared use compute node ). you should look at the “performance” output that LAMMPS prints after a normal “run” command has completed. it can provide useful hints at what happened.

axel.

akohlmey · October 10, 2019, 1:27pm

oh, and i forgot. another option to reduce storage load and improve i/o performance on writing/reading large files is to compress/decompress them on the fly. i.e. add the .gz extension or using dump style atom/gz from the COMPRESS package. this should compress those files by at least about 20% (that is the minimum for encoding random text files), more likely is a reduction by 50-60%. this is useful, even if you break down the files in chunks. it reduces the overall storage need and adds an implicit consistency check (i.e. you will notice if the files get “damaged” or corrupted).

axel.

Amir_Hosein_Sadeghi · October 11, 2019, 2:08am

Dear Axel,
I changed the input script in the following way:

#--------- Dynamic phase

timestep 0.01

thermo 10000

run 1000000

write_restart ./restart_before.${i}

Wiping out commands

unfix F_nve

unfix F_recenter

#---------- Data gathering phase

#Reset some settings

fix F_nve all nve

fix F_recenter bug recenter NULL NULL 0.0 shift all units box

timestep 0.002

variable j loop 10 #Having 10 roughly 7GB-size dump files

label dump_loop

dump D_bug bug atom 1000 bug.i${i}.j${j}.lmptrj

dump_modify D_bug scale no pbc yes

dump D_crowd crowd atom 1000 crowd.i${i}.j${j}.lmptrj

dump_modify D_crowd scale no pbc yes

restart 1000000 ./restarts/restart_during.${i}

run 5000000 post no pre no

undump D_bug

undump D_crowd

next j

jump ${fname} dump_loop # input.lmp is the input script itself to prevent any issues related to stdin and SELF.

#After this loop, the system evolves 5.1*10E7 time-steps

write_restart ./restart_after.${i}

And used the following command to do simulation:

lmp_serial -v in_filename data.chain.80 -v i 1 -v r 3 -v lz 52 -v sig2 0.3 -v n_crowd 1000 -v epsilon1 5.0 -v fname in.loop -in in.loop

But I received the following error (please see the attached log file for running on my laptop)

Segmentation fault: 11

Or

./ru****n_pc_cylinder.sh: line 15: 924 Bus error: 10 lmp_serial -v in_filename data.chain.80 -v i 1 -v r 3 -v lz 52 -v sig2 0.3 -v n_crowd 1000 -v epsilon1 5.0 -v fname in.loop -in in.loop

On my pc, or

./serial_run.sh: line 15: 15244 Segmentation fault {lmp_exec} < {lmp_input} > ${lmp_output}

On the cluster.

If I drop the pre no from the run command, everything goes well.
According to my test, using the compression version of dump atom style, while file is compressed to 40% of the original size, the run time increases by 20%. Regarding your response, how does compression implicitly prevent file corruption? Could you please explain more?
I have also a general question on how I can gain more insights on the run statistics provided in log.lammps such as proper choice of neigh_modify command? By experience or there are some resources?
Finally thanks for all your help and support.
Best regards,
Amir

log.lammps (9.85 KB)