memory usage per processor

Dear all,

I met a strange problem today, while trying to run a given input
script with 29 Mar 2011 lammps on a fedora machine (Linux
2.6.35.11-83.fc14.x86_64) with different numbers of processors:

* in serial, everything works fine:
Setting up run ...
Memory usage per processor = 667.974 Mbytes
Step CPU Tliq Tpar TotEng
       0 0 300 300.32252 -189119.31

* in parallel but on one processor, there is something strange with
the memory per processor:
Memory usage per processor = -8.79609e+12 Mbytes

But it works fine:
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31244 ljoly 20 0 797m 691m 4932 R 99.2 2.9 0:44.50 lmp_30Mar11_mpi

* in parallel on 2p x 2q x 1 processors, it seems to work fine, but if
I use an even number of processor in x or y direction then it hangs
after "Setting up runs" or I get:
Setting up run ...
[lpmcnpc260.univ-lyon1.fr:31299] *** An error occurred in MPI_Waitany
[lpmcnpc260.univ-lyon1.fr:31299] *** on communicator MPI COMMUNICATOR
9 DUP FROM 0
[lpmcnpc260.univ-lyon1.fr:31299] *** MPI_ERR_TRUNCATE: message truncated
[lpmcnpc260.univ-lyon1.fr:31299] *** MPI_ERRORS_ARE_FATAL (your MPI
job will now abort)

Could you give me some advice on how I should proceed to identify the
origin of the problem? I can send the input files in a separate mail.

Best,
Laurent

Please post an input script - I'll see if I can verify
the one-proc parallel strangeness.

Steve

Found the bug with the memory usage count.
It was an uninitialized variable in fix ave/spatial.
See the 1Apr11 patch.

I see no other problems with your script running on 1 or more procs.

Steve

Thanks a lot!

With the 2 Apr 2011 version I get a consistent output in serial and in
"parallel" on 1 processor.

However my script still behaves strangely when I run it on some grids.

For instance, using "processors 3 4 1" on 12 processors, lammps hangs
after writing "Setting up run ..."... And if I comment the fix
ave/spatial line, I get an error: "*** An error occurred in
MPI_Waitany"

I only find one use of MPI_Waitany in lammps source code:

remap.cpp: MPI_Waitany(plan->nrecv,plan->request,&irecv,&status);

Do you have some hint of what could be the problem?

Thanks,
Laurent

Thanks a lot!

With the 2 Apr 2011 version I get a consistent output in serial and in
"parallel" on 1 processor.

However my script still behaves strangely when I run it on some grids.

For instance, using "processors 3 4 1" on 12 processors, lammps hangs
after writing "Setting up run ..."... And if I comment the fix
ave/spatial line, I get an error: "*** An error occurred in
MPI_Waitany"

I only find one use of MPI_Waitany in lammps source code:

remap.cpp: MPI_Waitany(plan->nrecv,plan->request,&irecv,&status);

Do you have some hint of what could be the problem?

please provide an input that reproduces this behavior.
seeing for yourself is better than guessing.

axel.