LAMMPS on an AMD machine (LAMMPS 29 Sep 2021)

Dear Users and Developers,

I recently got access to an “AMD EPYC 7763 64-Core Processor” machine which has 2 sockets which make it a 128 cores in one node cluster. I build LAMMPS with openmpi compilers, I found that it is not giving full 100% usage of the CPU when I run test scripts for lammps instead it reaches 85% max CPU usage. Am I missing something or is this expected?

P.S :- If I am not clear in the above please let me know.

Best Regards
Puneet

How do you determine the CPU usage?
What kind of input were you running?

Dear Axel,

I was using the following for the test script

units lj
atom_style charge
dimension 3
boundary p p p

atom_modify map yes

lattice sc 1.0
region cube block 0 10 0 10 0 15
create_box 1 cube
create_atoms 1 box

mass 1 1.0

variable size atomfile epsilon.txt
set atom * charge v_size

velocity all create 0.105 ${SEED}
velocity all zero linear
velocity all zero angular

timestep 0.001

pair_style lj/cut 2.025

pair_coeff 1 1 1 1

neighbor 0.3 bin

neigh_modify every 1 delay 0 check yes

fix 1 all nvt temp 0.105 0.105 0.1

run 5000

unfix 1

I determined the CPU usage through the log file which gives output as
82.2% CPU use with 6 MPI tasks x 1 OpenMP threads.

Thanks and Regards
Puneet

It would be helpful to see the entire timing output.

My follow up question is: are you the only user of that machine? Could it be that it doesn’t have as many free CPU cores as you are using? Have you checked the total machine utilization?

Dear Axel,

Thanks for the reply.

Are you asking for the log file which is created by LAMMPS when you said “entire timing output”?

Yes, I make sure that the machine has that much CPUs available for the computation.

Best Regards
Puneet

How do you do that?

Dear Axel,

I use Slurm Workload Manager which has this command :-
squeue -o"%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C"

that tells me how many cpus a process is using.

Best Regards
Puneet

Sorry, but that does not tell you anything about what any other processes are using.

There are multiple possible reasons why you won’t see a 100% CPU utilization, but CPU utilization is not everything anyway. What matters is the actual performance and the (strong) parallel scaling. You could have full CPU utilization, but very inefficient performance due to busy-waiting in the MPI library and load imbalances.

Thus any real performance information can only be gained from looking at strong scaling benchmarking, i.e. how much “Loop time” a the input uses when running with 1, 2, 4, 8, 16, 32, 64, 128 MPI processes. Theoretically, the loop time should become half every time you double the number of processors, but in practice it will be less. This parallel scaling you can also then compare to that on a desktop (of course only up to the number of CPU cores available on it).

It is also important to pay attention to whether simultaneous multi-threading (aka hyperthreading) is enabled or not. If it is, you actually only have half the CPU cores as physical CPU cores and then your performance can be limited on a busy machine.

Dear Axel,

Thanks for your reply. I will look into it.

The multithreading is enabled which makes in 256 CPUs in the machine but, I only use 128 cores in the machine.

Best Regards
Puneet