Read_dump without atom ID

_Haidong_Fan_SCU · April 2, 2020, 8:37am

Dear Lammps,
I would like to use dump and read_dump to restart simulation of billion atoms. In the dump file, the atom ID occupies a lot of scratch. So I dump file without ID, but read_dump says “Read_dump field not found in dump file”. On sourceforge, you said atom ID should be dumped. I am wondering if there is other way to read_dump without IDs.
I think the atom ID is useless in read_dump. I manually changed the atom ID of several atoms to 1 in a good dump file, and read_dump reads well all the atoms. So can I dump all the atom ID to 1 to save much scratch? You know, for large simulations, the dump size is huge.
Best,
Haidong

akohlmey · April 2, 2020, 1:04pm

Dear Lammps,
I would like to use dump and read_dump to restart simulation of billion atoms. In the dump file, the atom ID occupies a lot of scratch. So I dump file without ID, but read_dump says “Read_dump field not found in dump file”. On sourceforge, you said atom ID should be dumped. I am wondering if there is other way to read_dump without IDs.

why not use restart files to restart? in fact, using a (text mode, uncompressed) dump file is taking up much more space than a (binary) restart. if you also consider that for a clean restart you need not only need positions but also velocities, the space occupied by atom IDs is small.

for your reference: for a full precision text more representation of a double precision floating point number, you need between 20 and 24 bytes, while in a restart file you need 8 bytes. plus reading and writing of text mode file requires time and CPU power to convert the data, which is not needed in binary files.

I think the atom ID is useless in read_dump.

no, it is not. if atoms cannot be followed by their atom id, any time to would restart, all atoms would “jump” because their identities will be switched around and thus making any time dependent analysis of the trajectory files difficult, if not impossible.

I manually changed the atom ID of several atoms to 1 in a good dump file, and read_dump reads well all the atoms. So can I dump all the atom ID to 1 to save much scratch? You know, for large simulations, the dump size is huge.

you are trying to save at the wrong place. better to save through using a binary/compressed file format for saving dumps/trajectories rather than losing information on restart.
LAMMPS has several options for that. please see the documentation.

axel.

_Haidong_Fan_SCU · April 2, 2020, 2:06pm

Thanks, Axel.
Yes, restart file is a good method. However, I would also make DXA analysis and monitor the simulations by dump file, and restart is only used occasionally. Seems dump is best to me. Also I dump with minimum text data, as shown below. You can see the ID uses much space. I also tested restart file, which is >half larger than the dump file below since restart file has velocity and other information, which are not important to me. Anyway, if the ID cannot be ingored or simplified, I will have to include it.

ITEM: ATOMS type x y z
43847583 3 -105.8 -108.7 -88.5

------------------ Original ------------------

akohlmey · April 2, 2020, 2:35pm

you are not making much sense. you use restart files for restarting and dump files for analysis. since you write the restart files infrequently their size is irrelevant compared to the dump files.

if you truncate the precision in the dump file as much as you do and also omit the velocities, you are not really restarting a simulation but are starting a new simulation. since you are seriously truncating the position data (and thus moving atoms and the more the farther away from the origin they are) and have to completely re-initialize the velocities. which means that you have to re-equilibrate that kind of restarted system before continuing the simulation.

apart from that, as i already mentioned, using text mode i/o is extremely inefficient, especially for large files.
if you output your data in binary (atom/custom dump style file name ends in .bin), you will require a third of the space at the same precision. if you are willing to lose some precision, you can use either dcd or xtc style dump files which will store data in either single precision floating point or fixed point differential storage. …and you will still have more usable information than with your extreme truncation of precision in your text mode dump output.

axel.

_Haidong_Fan_SCU · April 2, 2020, 2:42pm

Thank for your comments, Axel. Will think about this and have a balance between precision and efficiency.

------------------ Original ------------------

akohlmey · April 2, 2020, 2:55pm

Thank for your comments, Axel. Will think about this and have a balance between precision and efficiency.

there is no thinking required as far as restarting is concerned. what you are doing right now is bad and will invalidate the value of your work.

also for the choice of how you dump your trajectory data, the conclusions are very clear. all of the alternatives to store data more efficiently are preferable over what you are currently doing.

bottom line: if you don’t have enough storage to handle very large simulations, you either need to get more storage or don’t do such large simulations. trying to save space by sacrificing the value and correctness of your work is a very, very bad idea.

axel.

_Haidong_Fan_SCU · April 2, 2020, 3:04pm

Thank you for your suggestions.

------------------ Original ------------------

_Haidong_Fan_SCU · April 3, 2020, 3:09am

Thanks, Axel. Seems best choice is to dump files to postprecess with a low precision such as %1f of the coordinates, which is OK for DXA. Then have few restart files in case of restart.

Haidong Fan Ph.D
Professor, Associate Chair
Department of Mechanics, Sichuan University
Emails: [email protected], hfan85@…9062…du.cn
Webpage: http://acem.scu.edu.cn/teachers.php?cid=118&id=27
Google Scholar: https://scholar.google.com/citations?user=M9bg29kAAAAJ&hl=en

------------------ Original ------------------