HPC and ReaxFF

error.txt (3.81 KB)

The error message indicates that one or more processors was trying to allocate memory with no atoms, which could indicate a bad structure (atoms with wrong fractional coordinates so they are all packed within a small region) or a bad spatial decomposition.

Check the energy/force/pressure from runs that succeeded, and perhaps reduce the number of MPI tasks and try again.

Ray

The error message indicates that one or more processors was trying to
allocate memory with no atoms, which could indicate a bad structure (atoms
with wrong fractional coordinates so they are all packed within a small
region) or a bad spatial decomposition.

​ray,

i doubt that that is a correct assessment of the situation. considering the
number of atoms in the system (1.2 million​),
this is more likely due to a 32-bit integer overflow somewhere.

i think that trying to run such a large simulation on a (small) windows
cluster is "extremely courageous", to quote from one of my most favorite
(british) TV shows.

axel.

[email protected]...> Cc: Axel Kohlmeyer <[email protected]> Date:
Tuesday, 26 April 2016, 09:34PM +04:30 Subject: Re: [lammps-users] HPC and
ReaxFF

Dear Axel.
thanks for your reply.
the lammps version I used is "LAMMPS (22 Jan 2016-ICMS)".
the all nodes using windows server 2008 (64-bit). each node containing 28
Gig memory.
I used LAMMPS installer package for 64-bit version of Windows and I used
mpich2 to run parallel.
for mesuring the memory usage I used the windows taskbar screen and there
was that the memory usage increase very sharp
when I used the 32 core in the command. but when I used 64 cores in
command (16 core in each node) the simulation will ran but
stoped after a few step. in this state the log file show "Memory usage per
processor = 1324.09 Mbytes".​

the simulation box have 1.2M atoms, and the box size is 100*300*500.

​have you tried running significantly smaller systems with this setup, like
the examples bundled with LAMMPS? 1.2M atoms is *gigantic* for reaxff.
with the windows MPICH2 package, you can only run in parallel over TCP/IP,
and i doubt that you will get anything coming close to a decent performance
from multi-node calculations. my expectation for such a large system is -
if you can run it at all - would be that you will need a supercomputer to
get anywhere near useful trajectory data.

can you please tell us a bit about your level of experience with MD at all
and reaxff in particular?

axel.

Dear Ray
I am not sure understand your description.
the structure is cube box crystal and it is not complex.
in real each node containing 8 cores, but when I used the all 32 cores the simulation stoped at iniation of running.
but when I assumed each node contain 16 cores and write in 64 in the command, the simulation run for few steps.
when I verify the cfg file after 300 step (when stopped) there is no any visible problem in simulation box.

you mean I should reduce the cpu cores that I used. for example 24 cores and 3 nodes? or in 4 nodes?

is it need to increase the nodes, for example 6 nodes and 48 cores?
please advise me.
Hadi

Sorry I did not notice that you had 1.2M atoms… You should increase the number of nodes and MPI tasks (also try using more nodes but reduced number of MPIs per node), and you should also try reducing your system size.

Ray

Dera Axel
I run the simulation of 800000 atoms by this setup without any problem, but the simulation time was long.
My assumption is that regardless of the run timeو if the total memory is enough, I could simulate this amount of atoms.
I think I should increase the nodes that Ray said.
I want to know is it need to use process or partition command to prevent like this error? or optimum the simulation time?
Mohammad

Dera Axel
I run the simulation of 800000 atoms by this setup without any problem,
but the simulation time was long.

​this is a huge system for using reaxFF. please have a look at:
http://lammps.sandia.gov/bench.html#potentials

reaxFF calculations are *at least* two orders of magnitude slower than
conventional force field calculations. it is big even for conventional
force fields.

My assumption is that regardless of the run timeو if the total memory is
enough, I could simulate this amount of atoms.

​no. i don't think the reax/c code has been written with such gigantic
systems in mind.​ it definitely cannot handle as many atoms per MPI rank as
you have in your system. ...and even if you get it to work, it is going to
be very sloooooooow.

as i stated before, for any reasonable simulation speed, you will need
millions of service units on a very large supercomputer with a fast, low
latency interconnect. i seriously doubt that any number of windows machines
will suffice, especially when connected over large latency TCP/IP
communication.

I think I should increase the nodes that Ray said.

​no! what you should *first* do (and obviously have not done)​, is to
*think about* whether using reaxff for such a large system is such a good
idea.

good research with MD simulations does not depend on forcing arbitrary
simulations into the available hardware, but on making smart choices and
simulating exactly what you need to prove your point. very very large
simulations are often the result of too little planning and thinking. thus:

why did you pick reaxff? why does the system need to be this large and what
do you want to prove by such a large simulation that cannot be proven with
a smarter simulation protocol and/or much smaller systems?

you also didn't answer my question about your previous experience with MD
and reaxFF. how about it? for how long? how many studies/papers?

​axel.​

I want to know is it need to use process or partition command to prevent

1.2 million atoms is on the high end for ReaxFF, but should work fine on a sufficiently large computer. We routinely run simulations with 10 million atoms and several thousand cores. The behavior under discussion here is most likely caused by insufficient memory on the 4 nodes system. I suggest you stick to smaller problem sizes until you can get access to a bigger computer with more memory.

Aidan