I think I may have found a bug in the timer routine. I did some bug
hunting and it seems that some communication time may be mislabeled as
kspace time.
I have a system where I noticed the reported kspace time:
1. Doesn't scale well with number of processors
2. Depends on the cutoff radius (even though *everything* else is
held constant)
I ran an LJ system with kspace_style Ewald/n on 1 processor and got
the following:
Loop time of 31.3689 on 1 procs for 100 steps with 10000 atoms
Pair time (\) = 20\.8518 \(66\.4728\)
Kspce time \() = 8.28053 (26.3973)
Neigh time (\) = 2\.19111 \(6\.98499\)
Comm time \() = 0.0381899 (0.121745)
Outpt time (\) = 0\.00127912 \(0\.00407766\)
Other time \() = 0.00599742 (0.019119)
I then ran the same system with 2 processors and the pair time
decreased but the kspace time stayed almost the same (which doesn't
make any sense):
Loop time of 20.758 on 2 procs for 100 steps with 10000 atoms
Pair time (\) = 10\.6957 \(51\.5259\)
Kspce time \() = 8.1981 (39.4937)
Neigh time (\) = 1\.1067 \(5\.33144\)
Comm time \() = 0.752884 (3.62695)
Outpt time (\) = 0\.000813723 \(0\.00392004\)
Other time \() = 0.00376761 (0.0181501)
I then commented out the following line in ewald_n.cpp:
//MPI_Allreduce(cek_local, cek_global, 2*n, MPI_DOUBLE, MPI_SUM, world);
and I got
Loop time of 20.8964 on 2 procs for 100 steps with 10000 atoms
Pair time (\) = 10\.5278 \(50\.3811\)
Kspce time \() = 4.1784 (19.9958)
Neigh time (\) = 1\.10506 \(5\.28826\)
Comm time \() = 5.08053 (24.3129)
Outpt time (\) = 0\.000882268 \(0\.0042221\)
Other time \() = 0.0037092 (0.0177504)
This makes more sense. This run took approximately the same time as
the one above but the communication time is higher and kspace time is
lower.
I then ran the same system with 2 processors and "kspace_modify
compute no" and got:
Loop time of 15.4689 on 2 procs for 100 steps with 10000 atoms
Pair time (\) = 10\.5201 \(68\.0083\)
Kspce time \() = 0 (0)
Neigh time (\) = 1\.10849 \(7\.16592\)
Comm time \() = 3.83572 (24.7963)
Outpt time (\) = 0\.000850201 \(0\.0054962\)
Other time \() = 0.00370133 (0.0239276)
The communication time is also higher than in the second run. A
similar trend is seen for 8 processors.
My test system is attached and I can provide more evidence if needed.
Stan
in.lj_cold (1.08 KB)
in.lj_hot (389 Bytes)