Running multiple jobs in as in the single node


I am running multiple jobs in a single node in the cluster.
A single node has 64 cores and 128 threads. I am simulating 8 systems containing single polymer chains of different lengths. I have done performance tests for all the systems and I got more than 5000 ns/day for all the systems for different numbers of threads.

for system1 assigns 8 threads, system2 takes 15 threads, and system3 takes 18 threads … so on.
Total 90 threads for all the systems in a single node.
I have installed LAMMPS using openmpi module.

But 22 hours later it is still running even though I only want to run 5 microseconds and my dump frequency is every 1 nanosecond. I got 20 snapshots till 22 hours and some of them are stuck.

I don’t know why is it happening. Can someone explain to me what is happening here?

Not really without knowing more details about how exactly you are running jobs.
At the moment, all that can be done are guesses and conjecture.

You are talking about “threads” but also mention “OpenMPI”, which uses processes and is different from “OpenMP” which uses threads. Either would be used with different command line flags.

If your node has 64 cores and 128 threads, then you have to realize that it uses hyperthreading, which means that the 64 cores are shared by the 128 threads (two threads to a core). Which also means that when looking for performance, you do not want to use more than 64 MPI processes. The gain from using more than that will be very small as this doubles the demand on memory bandwidth and cuts the efficiency of the CPU caches in half.

A big problem for your calculation and a possible reason for the kind of slowdown you describe could be due to processor affinity, e.g. from multiple OpenMPI jobs assigning tasks to the same CPU cores.

At any rate, these are not LAMMPS issues but hardware / setup / MPI usage issues and thus off-topic for this forum. If you don’t know how to resolve this, you need to find somebody local with the corresponding expertise that can teach you the necessary skills.