When I run ten 4 core lammps calculations on a 56 core node, using a Stillinger-Weber potential, simultaneously, they finish in about 1/7th or 1/8th the wall time as running each 4 core job successively. This performance makes sense to me.
However, when I do the same using a GAP potential in quip, the parallel job actually takes about the same amount of time. That is, two GAP potential calculations, each run on 8 cores, submitted in parallel, on a 56 core node, will finish in 22 minutes, while if I run them successively they finish in 26 minutes. So having two jobs run hardly increases the performance and doing 3 or more doesn’t increase the performance at all. What might be the bottleneck?
I was wondering if it was successive loads of the potential, since it’s hundreds of MBs. If this is the case, is there a way to only load the potential once for each calculation? Or is it something else entirely?
The system is a disordered solid with 500 atoms. Thanks!