[lammps-users] Performance about lammps on GPU

Dear all,

  First, Thanks great works from Michael Brown. Running lammps on GPU really really fast and save me much time. I builded a desktop with GTX graphics card to run lammps. My computing systems include LJ pair wise potential and use PPPM to solve coulomb potential. And the time for 1000 time steps is about 10s.
  I just bought another same nvidia gtx card, and test lammps performance again. I know the parallel lammps gpu performance will not be as good as cpu parallel performance, therefore I am not surprised at no improvement at all. However, I am surprised the the performance while running two different jobs at same time on each gpu node. While I run job 1 on card 0, it take 20s to finish 1000 time steps, and also running job 2 on card 1 will cost 20s to finish 1000 time steps. But it only need to 10s to finish 1000 time steps if there is only one job running.
  I am sure my motherboard support 2 PCI-E2.0 interface, both running X16 speeds.
  Could give me some hint to solve this issue?

Best wishes,
Yangpeng

Dear all,

   First, Thanks great works from Michael Brown\. Running lammps on GPU really really fast and save me much time\. I builded a desktop with GTX graphics card to run lammps\. My computing systems include LJ pair wise potential and use PPPM to solve coulomb potential\. And the time for 1000 time steps is about 10s\.
   I just bought another same nvidia gtx card, and test lammps performance again\. I know the parallel lammps gpu performance will not be as good as cpu parallel performance, therefore I am not surprised at no improvement at all\. However, I am surprised the the performance while running two different jobs at same time on each gpu node\. While I run job 1 on card 0, it take 20s to finish 1000 time steps, and also running job 2 on card 1 will cost 20s to finish 1000 time steps\. But it only need to 10s to finish 1000 time steps if there is only one job running\.

whether you see parallel speedup or not depends a _lot_ on how large
your problem is
and how much the total time is dominated by the parts of the
calculation that are not run
on the GPU. one experiment that you can try is to increase the coulomb
cutoff (only
not the lj one) as that should increase the amount of work on the GPU
and reduce the
amount of work for PPPM.

   I am sure my motherboard support 2 PCI\-E2\.0 interface, both running X16 speeds\.
   Could give me some hint to solve this issue?

please use the nvidia-smi utility to monitor GPU activity. it could be
that both jobs run on the same gpu.
i've seen this very recently happen to me with a serial binary, but
have not yet been able to narrow
it down to a simple and specific input to demonstrate or track down the problem.

cheers,
    axel.