I recently got access to an “AMD EPYC 7763 64-Core Processor” machine which has 2 sockets which make it a 128 cores in one node cluster. I build LAMMPS with openmpi compilers, I found that it is not giving full 100% usage of the CPU when I run test scripts for lammps instead it reaches 85% max CPU usage. Am I missing something or is this expected?
P.S :- If I am not clear in the above please let me know.
It would be helpful to see the entire timing output.
My follow up question is: are you the only user of that machine? Could it be that it doesn’t have as many free CPU cores as you are using? Have you checked the total machine utilization?
Sorry, but that does not tell you anything about what any other processes are using.
There are multiple possible reasons why you won’t see a 100% CPU utilization, but CPU utilization is not everything anyway. What matters is the actual performance and the (strong) parallel scaling. You could have full CPU utilization, but very inefficient performance due to busy-waiting in the MPI library and load imbalances.
Thus any real performance information can only be gained from looking at strong scaling benchmarking, i.e. how much “Loop time” a the input uses when running with 1, 2, 4, 8, 16, 32, 64, 128 MPI processes. Theoretically, the loop time should become half every time you double the number of processors, but in practice it will be less. This parallel scaling you can also then compare to that on a desktop (of course only up to the number of CPU cores available on it).
It is also important to pay attention to whether simultaneous multi-threading (aka hyperthreading) is enabled or not. If it is, you actually only have half the CPU cores as physical CPU cores and then your performance can be limited on a busy machine.