Negative values of memory required estimation (Per MPI rank memory allocation)

Hi!

I’ve ran a simulation on 8 threads of a supercomputer, in order to test the required memory thanks to the LAMMPS estimation of its memory needs, however I obtain negative values for the fix I’m using. It only occured in this simulation, but re-running it has given me the same illogical result, could you explain its origin?

Per MPI rank memory allocation (min/avg/max) = -69.32 | -61.67 | -43.51 Mbytes

Also, as you can see in the following output file, the “ending” of the simulation with the performance information is printed twice, why is that?

Thank you very much! :slight_smile:

Here’s my input file :

suffix omp
package omp 1

variable lattconst equal 5
variable repetition equal 250

---------- Initialize Simulation ---------------------

clear
units metal # pour avoir des unités en Angstroms et eV, entre autres
dimension 3
boundary p p p # pour définir que les limites dans les directions de chaque dimension sont périodiques
atom_style atomic
atom_modify map array

---------- Create Atoms ---------------------

lattice fcc ${lattconst}
region box block 0 1 0 1 0 1 units lattice
create_box 1 box

lattice fcc {lattconst} orient x 1 0 0 orient y 0 1 0 orient z 0 0 1 create_atoms 1 box replicate {repetition} {repetition} {repetition}

---------- Define Interatomic Potential ---------------------

pair_style eam/alloy # détermine quel type de potentiel interatomique est utilisé
pair_coeff * * Al99.eam.alloy Al # détermine où se trouvent les coefficients du type de potentiel, l’extension peut servir de pointeur vers celui à utiliser.
neighbor 2.0 bin
neigh_modify delay 0 every 1 check yes

---------- Run Minimization ---------------------

timestep 0.001
#fix 1 all nvt temp 1000.0 1000.0 0.1
fix 2 all box/relax iso 0.0

thermo 100
thermo_style custom step temp enthalpy press cella cellb cellc
thermo_modify flush yes
thermo_modify norm yes

min_style cg
minimize 1e-25 1e-25 5000 10000

fix 1 all npt temp 1000.0 1000.0 0.1 iso 1000.0 1000.0 0.1

variable iterations equal 10000
#dump mydump all xyz 1000 C:\Bin\LAMMPS\Exo7[dump5.xyz](http://dump5.xyz)
run ${iterations}

définition de variables

variable natoms equal “count(all)”
variable latticeconstant equal “cella/250”

affichage des variables définies

print “Number of atoms = {natoms};" print "Lattice constant = {latticeconstant}”

print “All done!”

And here’s the output file, from which I’ve cut some of the thermodynamic results to ease for concision purposes :

LAMMPS (22 Aug 2018)
using 1 OpenMP thread(s) per MPI task
using multi-threaded neighbor list subroutines
using 1 OpenMP thread(s) per MPI task
using multi-threaded neighbor list subroutines
Lattice spacing in x,y,z = 5 5 5
Created orthogonal box = (0 0 0) to (5 5 5)
2 by 2 by 2 MPI processor grid
Lattice spacing in x,y,z = 5 5 5
Created 4 atoms
Time spent = 0.00053367 secs
Replicating atoms …
orthogonal box = (0 0 0) to (1250 1250 1250)
2 by 2 by 2 MPI processor grid
62500000 atoms
Time spent = 1.43158 secs
Last active /omp style is pair_style eam/alloy/omp
Neighbor list info …
update every 1 steps, delay 0 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 8.28721
ghost atom cutoff = 8.28721
binsize = 4.1436, bins = 302 302 302
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair eam/alloy/omp, perpetual
attributes: half, newton on, omp
pair build: half/bin/atomonly/newton/omp
stencil: half/bin/3d/newton
bin: standard
Setting up cg style minimization …
Unit style : metal
Current step : 0
WARNING: Energy due to 1 extra global DOFs will be included in minimizer energies
Per MPI rank memory allocation (min/avg/max) = 3655 | 3655 | 3655 Mbytes
Step Temp Enthalpy Press Cella Cellb Cellc
0 0 -4.8735152 -128413.18 1250 1250 1250
1901 0 -3.36 6.3185939e-07 1012.5012 1012.5012 1012.5012
Loop time of 27223 on 8 procs for 1901 steps with 62500000 atoms

99.0% CPU use with 8 MPI tasks x 1 OpenMP threads

Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
-2.36885242604 -3.35999998196 -3.35999998196
Force two-norm initial, final = 4.69624e+08 0.00151612
Force max component initial, final = 4.69624e+08 0.00151612
Final line search alpha, max atom move = 0.00013913 2.10937e-07
Iterations, force evaluations = 1901 1903

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

Hi!

I’ve ran a simulation on 8 threads of a supercomputer, in order to test the required memory thanks to the LAMMPS estimation of its memory needs, however I obtain negative values for the fix I’m using. It only occured in this simulation, but re-running it has given me the same illogical result, could you explain its origin?

it could be a bug in LAMMPS where something accesses uninitialized memory, or it could be an 32-bit signed integer overflow issue. you have 62.5 million atoms, each with an average of 70 neighbors, so already the the neighborlists will consume approx. 62.5704/8= > 2GB RAM per MPI rank which will overflow a 32-bit signed integer, i.e. the default “int” type even on many 64-bit machines. the version of LAMMPS you are using has some known bugs, so i am curious if this would also happen with the latest patch release 15 May 2019.

Per MPI rank memory allocation (min/avg/max) = -69.32 | -61.67 | -43.51 Mbytes

Also, as you can see in the following output file, the “ending” of the simulation with the performance information is printed twice, why is that?

it is not the same summary. your input contains one minimization and one MD run. you get a summary after each of them. see the output leading up to it. for the run, you can turn it off the post-run info by using the “post no” option.

axel.

Thanks for the details on the meory issue, it’s really appreciated!

What I meant for the performance being printed twice is this :

11300 1.2892813e-10 -3.3498893 976.0105 1012.084 1012.084 1012.084
11400 1.9633607e-10 -3.3494821 1015.3376 1012.0672 1012.0672 1012.0672
11500 2.8486634e-10 -3.3495045 1013.1765 1012.0681 1012.0681 1012.0681
11600 3.5477027e-10 -3.34981 983.66409 1012.0807 1012.0807 1012.0807
11700 4.474444e-10 -3.3496352 1000.5514 1012.0735 1012.0735 1012.0735
11800 5.9849632e-10 -3.349377 1025.4916 1012.0628 1012.0628 1012.0628
11900 8.6284436e-10 -3.3497704 987.49271 1012.0791 1012.0791 1012.0791
11901 8.6395359e-10 -3.3497464 989.80922 1012.0781 1012.0781 1012.0781
Loop time of 105468 on 8 procs for 10000 steps with 62500000 atoms

Performance: 0.008 ns/day, 2929.673 hours/ns, 0.095 timesteps/s
99.3% CPU use with 8 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

[…]

Total wall time: 36:52:30

It doesn’t seem to me as though it was for the minimisation, but rather twice the MD performance summary, as it’s the same values and formatting, and the last few thermodynamic values are printed twice as well. Am I missing something?

i don’t know. it could be a bug in LAMMPS (which may already have been corrected), could be a miscompiled executable, could be memory corruption.
but before looking into it, i would need to get confirmation, that this happens with 1) the latest LAMMPS patch version (15 May 2019), and 2) with one of the benchmark examples (e.g. in.lj) reliably and 3) with no more than 8 MPI ranks (i.e. like in your example run).

axel.

Alright thanks! I’ll look into it to see what could be causing this. In any way, it isn’t much of an issue.

Thanks again!
Antoine.