ReaxFF Parallelization Slowdown

Hi all,
I was wondering if anyone here has had any experiences with optimizing reaxff (either reax or reax/c) in parallel. When running the files in the examples folder of lammps, the runtimes end up being significantly longer than I expect. For example, running 384 atoms costs me 1 second per timestep. I realize reaxff has a complicated functional form which will cause it to be much more expensive than other simpler force fields.

Also, I have tried scaling up the number of processors used in each calculation and, without fail, I have slower computation times. At the point of including 16 cpus it takes several minutes just for each run to begin. Anyone know why? Is there any way I can configure lammps specifically for my system so that this will not occur? Is this a problem with the type of mpi I am using? Or could it be an issue with the system architecture?

Best,
Josh Deetz
PhD Candidate Student
Chemical Engineering
University of California, Davis
ยง 408-242-5523

Hi Josh,

I suppose you were running the /examples/reax tatb example? The step
per cpu second (spcpu) output I obtained with my linux box with reax
and reax/c are 2.78 and , 3.96 respectively, which are significantly
faster than 1 second/step. Maybe something wrong with your MPI lib or
executable?

Parallel speed up with 4 cores I obtained are 2.29x and 1.82x,
respectively. It is normal that parallel efficiency is not perfect.
Spatial decompose a small system with a larger number of MPI processes
will hurt efficiency even more.

Cheers,
Ray

You can also at the Benchmark page on the WWW site
under potentials. TImings for many systems and many potentials
are listed. ReaxFF is slow, but not as slow as you are seeing. If
you are running with pair_style reax (not reax/c), you had to build
the lib with Fortran. Using a good Fortran compiler (e.g. Intel's)
will be faster than something generic like gfortran.

Steve

I ran the benchmark tests on a smaller set of atoms (1856 instead of 32480) I am even getting slower loop times there. Essentially I am seeing that the benchmark for reaxc has 2.9% communication time, whereas my runs are 22% comm time. I am thinking this is part of the reason I am seeing slower times on increasing the number of processors. It may have to do with my choice of compiler with lammps, or one of the other reaxc specific settings. In the worst case, it may have something to do with the architecture of my system. I will poke around and let everyone know if I have a breakthrough.