[lammps-users] Strange problem: decreasing efficiency when thermodynamic output is frequent

_Jorge_R.G · July 30, 2010, 11:10am

Dear all,

I am running MD simulations of polymers and I need to save thermodynamic data very frequently for post-processing (ideally, every time step). I’m observing a strange problem with my runs: the efficiency decreases with time. I understand that saving a lot of data to disk very frequently slows down the performance, but I can’t see why this performance should decrease continuously as the run proceeds. After just a few million steps, the performance, measured in steps per CPU second, gets so slow that I need to stop my simulations.

I’m running lammps-4Jul10. I’ve tried to run my simulations using a serial version of lammps, and also MPI (I’ve tried OPENMPI, LAM-MPI and MPICH2), using g++ 4.4 on linux boxes (I’ve tried two different hardware and linux versions). The same thing happens all the time. Depending on how frequently you save data, the efficiency goes down faster. I know it’s not a hardware or OS problem. I can write a simple numerical code that generates Gigabytes of data very fast, and the efficiency of the code does not go down as the code runs.

Has anybody observed this behaviour before?

I can provide data and plots of this efficiency decrease if needed.

Here’s the script I’m using to run my simulations. The evolution of the efficiency can be observed in the file EFFICIENCY.out.

Any help will be much appreciated.

Regards,

Jorge

###########################################

units real
dimension 3
newton on off
processors * * *
boundary p p p

atom_style full
pair_style lj/cut 9.0
bond_style harmonic
angle_style harmonic
dihedral_style multi/harmonic
improper_style none

read_restart restart_C24_00_450K_1atm_ljcut.dat
reset_timestep 0

pair_modify tail yes mix arithmetic
special_bonds lj/coul 0.0 0.0 0.0

timestep 5.0
run_style respa 2 5
fix 1 all npt temp 450.0 450.0 100 iso 1.0 1.0 100
thermo 1
variable a equal spcpu
variable b equal step
fix 2 all print 10000 “$b $a” file EFFICIENCY.out screen no
run 2000000

|

_Ahmed_E_Ismail1 · July 30, 2010, 1:53pm

Jorge:

You didn’t provide EFFICIENCY.out with this e-mail message.

Also, have you examined the block of information that’s printed out at the end of a LAMMPS run, where it tells you the time breakdown of the different parts of the computation, including the time spent on output of data? You should see if this is really the dominant cost of the simulation, or if something else is causing the problem.

However, to some extent, a slow down of the simulation can occur when data is printed out more frequently, as I believe the standard practice of LAMMPS is to calculate such data only when it needs to be reported (e.g., if a compute or thermo needs it). So, in addition to increased time for the computations, you also have the increased cost of waiting for the file system to respond. (This would potentially be an issue with parallel runs, and serial runs can’t break the problem up to save time.)

–AEI

akohlmey · July 30, 2010, 2:25pm

dear jorge,

there are a number of possible explanations, but before going
into any details, can you please explain a little bit more about
what you are going to need the high resolution data for?

the time spent on writing and particularly reading text mode files
it enormous and non-parallel, so i would imagine realizing your
analysis as a compute to be run within LAMMPS would make
much more sense and very likely be much more efficient.

also, if you collect data at every step it will be highly correlated
and thus its statistical relevance low.

as for the decrease in efficiency. are you sure that this is not
a "feature" if your input?
unless you are at the limit of scaling, the performance of a
classical MD code is mostly determined by the performance
of the non-bonded interactions, which in turn depends on the
average number of neighbors. if your system would start to
"clump together", performance would go down. similarly on
expansion and with evaporation, performance will go up.

i just started a run with the same efficiency print fix based on
the melt example input and my performance is more-or-less
constant, if not increasing a little bit over time.

the fact that you see the efficiency decrease faster with
more frequent output may be simply caused by more output
slowing down lammps overall.

however, without having a (complete!) input file set in hand
that exactly reproduces your issue, it is only speculation.

have you tried profiling lammps to see where all the time is spent?

cheers,
axel.

_Jorge_R.G · August 2, 2010, 12:48pm

Dear all,

I’m still having efficiency problems with LAMMPS when I compute and output data very frequently. I think I found a simple case where this unexpected behaviour can be reproduced. If I run the following script (based on the melt example in the LAMMPS distribution):

##################################################

# 3d Lennard-Jones melt
units lj
atom_style atomic

lattice fcc 0.8442
region box block 0 10 0 10 0 10
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 3.0 87287

pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify every 20 delay 0 check no

fix 1 all nve

variable a equal spcpu
variable b equal step
fix 2 all print 10000 “$b $a” file EFFICIENCY.out screen no

thermo 1
compute pp all pressure thermo_temp pair
compute pk all pressure thermo_temp ke
thermo_style custom step vol c_pp c_pk

run 100000
#########################################################

It can be seen that the efficiency, measured as number of steps per CPU second (spcpu) decreases as the number of steps grows (data saved by the script in the file EFFICIENCY.out). I’m running the script in a linux cluster, compiling in serial and also with different versions of MPI, and I always observe the same trend in the efficiency.

I think it is related with the use of two “compute … pressure” commands. If I only use one, the efficiency stays constant throughout the simulation. If I keep the two commands, but save data less frequently, say every 10 time steps, the efficiency stays constant again. Unfortunately, I would like to analyze two different contributions to the stress tensor very frequently.

Has anybody observed anything like this? I would appreciate any help with this, because I’m quite lost.

Thanks,

Jorge
|

akohlmey · August 2, 2010, 12:56pm

Dear all,

dear jorge,

just a quick remark, that i have been able to reproduce your problem
with the files you sent me and that i currently have no idea where to
look for the issues. i am away at a conference for this week, so my
time for hacking will be limited.

cheers,
axel.

sjplimp · August 3, 2010, 3:19pm

I’m not seeing the problem on my box. I ran the script below
with run 100000 replaced by ten run 10000 commands,
so you can see the LAMMPS timing summary printed out 10 times:

1 proc
Loop time of 51.0323 on 1 procs for 10000 steps with 4000 atoms
Loop time of 51.3352 on 1 procs for 10000 steps with 4000 atoms
Loop time of 51.4814 on 1 procs for 10000 steps with 4000 atoms
Loop time of 51.8335 on 1 procs for 10000 steps with 4000 atoms
Loop time of 51.9647 on 1 procs for 10000 steps with 4000 atoms
Loop time of 52.0721 on 1 procs for 10000 steps with 4000 atoms
Loop time of 52.4668 on 1 procs for 10000 steps with 4000 atoms
Loop time of 52.5027 on 1 procs for 10000 steps with 4000 atoms
Loop time of 52.6146 on 1 procs for 10000 steps with 4000 atoms
Loop time of 52.8703 on 1 procs for 10000 steps with 4000 atoms

4 procs
Loop time of 21.6205 on 4 procs for 10000 steps with 4000 atoms
Loop time of 21.5502 on 4 procs for 10000 steps with 4000 atoms
Loop time of 23.6937 on 4 procs for 10000 steps with 4000 atoms
Loop time of 23.2619 on 4 procs for 10000 steps with 4000 atoms
Loop time of 23.7347 on 4 procs for 10000 steps with 4000 atoms
Loop time of 24.5361 on 4 procs for 10000 steps with 4000 atoms
Loop time of 24.5727 on 4 procs for 10000 steps with 4000 atoms
Loop time of 23.6153 on 4 procs for 10000 steps with 4000 atoms
Loop time of 23.8619 on 4 procs for 10000 steps with 4000 atoms
Loop time of 23.6136 on 4 procs for 10000 steps with 4000 atoms

The slight rise in time is probably just due to the system dis-ordering
and thus memory accesses becoming more random.

The data in EFFICIENCY.out is somewhat pointless I think. The spcpu
output is only between successive invocations of thermo output. Since
you are outputting thermo info every timestep, you are computing the
steps/second from only measuring the time for a single step, which
is not going to be very accurate. The time for 10000 steps is
what is listed above and seems to be constant.

Steve

_Jorge_R.G · August 3, 2010, 3:53pm

Dear Steve,

Thanks for the response. I thought the spcpu would be calculated with respect to the frequency specified in the “fix … print” command. I was confused about this. I understand that, if the spcpu refers to just one step, its value might not be representative of the efficiency of the simulation.

However, I can reproduce the same behaviour if I run the following script (in which I output pressure data using “fix … print” every step, and I output spcpu every 10000 using “thermo_style custom”, so that I’m sure that spcpu refers to the last 10000 steps):

sjplimp · August 3, 2010, 6:01pm

ok - I see the problem now - there is a memory management bug that is growing
an array when the output frequency is every step - I’ll post a patch soon that should
fix it …

Steve

_Jorge_R.G · August 3, 2010, 8:59pm

Thank you very much again!

Regards,

Jorge