Bottleneck for GAP potential calculations run on the same node? (quip)

When I run ten 4 core lammps calculations on a 56 core node, using a Stillinger-Weber potential, simultaneously, they finish in about 1/7th or 1/8th the wall time as running each 4 core job successively. This performance makes sense to me.

However, when I do the same using a GAP potential in quip, the parallel job actually takes about the same amount of time. That is, two GAP potential calculations, each run on 8 cores, submitted in parallel, on a 56 core node, will finish in 22 minutes, while if I run them successively they finish in 26 minutes. So having two jobs run hardly increases the performance and doing 3 or more doesn’t increase the performance at all. What might be the bottleneck?

I was wondering if it was successive loads of the potential, since it’s hundreds of MBs. If this is the case, is there a way to only load the potential once for each calculation? Or is it something else entirely?

The system is a disordered solid with 500 atoms. Thanks!

There is a lot of required information missing here:

  • what is your LAMMPS version?
  • what platform are you running on?
  • what is your exact command line?
  • if you compiled LAMMPS with MPI support, what is your MPI library?
  • did you do a scaling test (i.e. run a single input - and nothing else - with 1, 2, 4, 8, 16 CPUs and check the parallel efficiency for your input?
  • what is your “performance” output? is there any indication of a load imbalance?
  • do you have equivalent numbers for the same system for SW?