I’ve done the following 3 tests following your suggestion (the in files and configuration file are in the attachments, a typical one is copied in the end), and in every in file, there is only on run command without any minimization.
First, the run verlet/split still didn’t work, and I got the same error,
ERROR on proc 0: TIP4P hydrogen is missing (…/pppm_tip4p.cpp:488)
Since there is only one run now, I think this is probably a bug in LAMMPS.
Then, I switched to openmp scheme. I first run a task with 2 MPI tasks and no omp thread as benchmark, the runtime statistics is as below
Pair time () = 4.68271 (37.4681)
Bond time () = 0.761198 (6.09063)
Kspce time () = 6.60392 (52.8404) Neigh time () = 0.196383 (1.57134)
Comm time () = 0.0914288 (0.731556) Outpt time () = 0.022729 (0.181864)
Other time (%) = 0.139489 (1.11611)
FFT time (% of Kspce) = 3.5833 (54.2602)
FFT Gflps 3d (1d only) = 2.52843 10.6664
Then I switched to the OMP scheme by adding the line
package omp 4 force/neigh
in the in file, and submitted the job using
mpirun -x OMP_NUM_THREADS=4 -np 2 /apps/lammps/24May13/lmp_openmpi -sf omp -screen scr.log -in in-omp
, and got the runtime statistics as shown below,
Pair time () = 7.32019 (9.97139)
Bond time () = 1.26643 (1.7251)
Kspce time () = 60.8111 (82.8356) Neigh time () = 0.398665 (0.543052)
Comm time () = 2.77501 (3.78005) Outpt time () = 0.0216095 (0.029436)
Other time (%) = 0.818844 (1.11541)
FFT time (% of Kspce) = 85.8029 (141.097)
FFT Gflps 3d (1d only) = 0.105592 8.92023
Then I got puzzled.
- Although I know that there might be something to do with the omp which make the pair time (bond, Neigh, comm etc.) longer then MPI-task-only jobs, I didn’t expect that the Kspace calculation is almost 10 times longer. In other words, it took me 10 times more time by using 8 cores with 2 MPI tasks with 4 OMP thread each than using 2 MPI tasks only.
- In the log file, in there any way that I can find how many cores are being effectively used?
split-and-omp.tar.gz (717 KB)