problem with "dump_modify append" when using MPI

Hello, Lammps users,

I want dump snapshots to be appended to the end of an existing dump file. So I am using the commond:

dump_modify 1 append yes.

But there is a problem with it, when I try to run the simulation on a parallel machine by MPI, the dump output file is in a mess, looks like:

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
28314
ITEM: BOX BOUNDS pp pp pp
-14.5 14.5
-14.5 14.5
-14.5 14.5
ITEM: ATOMS id mol type xu yu zu vx vy vz
1 1 1 -13.9423 -13.9423 -9.2 1.93677 -0.508665 0.243565
2 1 1 -13.9423 -13.9423 -9.8 1.50259 -0.535623 0.243565
3 1 1 -13.9423 -13.9423 -10.4 1.06841 -0.562582 0.243565

......

606 68 1 2.78846 -11.7115 -10.4 -0.591271 -0.80807 0.603365
60ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
28314
ITEM: BOX BOUNDS pp pp pp
-14.5 14.5
-14.5 14.5
-14.5 14.5
ITEM: ATOMS id mol type xu yu zu vx vy vz
1 1 1 -13.9423 -13.9423 -9.2 1.93677 -0.508665 0.243565
2 1 1 -13.9423 -13.9423 -9.8 1.50259 -0.535623 0.243565
3 1 1 -13.9423 -13.9423 -10.4 1.06841 -0.562582 0.243565

......

I guess all porcessors participate in dump in a random order, and that makes a mess.

After I delete the command "dump_modify append" in my script, everything is good again.

Thank you very much,

Best wishes,

Wei Chen

Hello, Lammps users,

I want dump snapshots to be appended to the end of an existing dump file. So I am using the commond:

dump_modify 1 append yes.

this is almost always a bad idea.
it is better to have each run write to a
different file in order to avoid problems
with file corruption on networked file
systems. dump files can always be
concatenated at a later stage.

But there is a problem with it, when I try to run the simulation on a parallel machine by MPI, the dump output file is in a mess, looks like:

please check the lammps output carefully.
this looks almost like you are trying to run
a serial executable in parallel. also you have
to make sure that any previous run was not
terminated prematurely and left you with an
incomplete file.

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
28314
ITEM: BOX BOUNDS pp pp pp
-14.5 14.5
-14.5 14.5
-14.5 14.5
ITEM: ATOMS id mol type xu yu zu vx vy vz
1 1 1 -13.9423 -13.9423 -9.2 1.93677 -0.508665 0.243565
2 1 1 -13.9423 -13.9423 -9.8 1.50259 -0.535623 0.243565
3 1 1 -13.9423 -13.9423 -10.4 1.06841 -0.562582 0.243565

......

606 68 1 2.78846 -11.7115 -10.4 -0.591271 -0.80807 0.603365
60ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
28314
ITEM: BOX BOUNDS pp pp pp
-14.5 14.5
-14.5 14.5
-14.5 14.5
ITEM: ATOMS id mol type xu yu zu vx vy vz
1 1 1 -13.9423 -13.9423 -9.2 1.93677 -0.508665 0.243565
2 1 1 -13.9423 -13.9423 -9.8 1.50259 -0.535623 0.243565
3 1 1 -13.9423 -13.9423 -10.4 1.06841 -0.562582 0.243565

......

I guess all porcessors participate in dump in a random order, and that makes a mess.

your guess is wrong. in this style of dump, output is done
by only one processor.

After I delete the command "dump_modify append" in my script, everything is good again.

can you demonstrate this by modifying one of the lammps examples,
e.g. the melt example?

thanks,
    axel.

Hi, Axel,

Thanks for your reply. I tried the example melt with and without "dump_modify append"

the command is

dump_modify id append yes

I am sure all previous runs are finished. The dump file with "append" is still in mess,

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
4000
ITEM: BOX BOUNDS pp pp pp
0 16.796
0 16.796
0 16.796
ITEM: ATOMS id type xs ys zs
1 1 0 0 0
2 1 0.05 0.05 0

......

1501 1 0.5 0ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
4000
ITEM: BOX BOUNDS pp pp pp
0 16.796
0 16.796
0 16.796
ITEM: ATOMS id type xs ys zs
1 1 0 0 0
2 1 0.05 0.05 0

......

Best wishes,

Wei

The dump file with "append" is still in mess,

What does this mean?

I just ran bench/in.lj and added these lines:

dump 1 all atom 10 tmp.dump
dump_modify 1 append yes

I get the same dump file whether I comment out
the 2nd line or not. It works in serial or parallel.

Steve

and would you please post the log file that you create when running in parallel.
i maintain, that what you are showing is most likely to happen with an
incorrectly compiled executable. i could reproduce this on my local machine
when running a serial executable with 8 copies.

thanks,
     axel.

i send you a sent of file in a private e-mail.
no need to spam the list with this.

axel.

Hi, Axel,

Thank you for your reply.

You are right, I checked the log file in parallel with 4 processors, but it is said

1 by 1 by 1 MPI processor grid

......

Loop time of 1.08555 on 1 procs for 250 steps with 4000 atoms

I am running a serial executable by mistake.

Thanks again,

Best wishes,

Wei

Hi, Axel,

It is clear now, there are two different MPI in workstation, when lammps is built, it finds the files of MPI A, but when I submit the job, system uses MPI B to run lammps. so it is regarded as a serial program by MPI B.

Best wishes,

Wei