Dear users,
I am facing a peculiar problem while running an NEB calculation on my cluster with several nodes, and 24 cores on each node. I observed that the system was not allowing any arbitrary combination of no. of replicas and the number of cores per replica. Upon probing the issue further, I realized that for reasons unknown to me, the NEB run is allowed only if the number of cores per image is more than the total number of nodes allocated for the computation. For instance, if I use a single node with 24 cores and partition them for 12 images as 12 by 2, it would run fine. However, if I try to run for 24 images with one core per image, i.e., 24 by 1, the job will terminate with an error file showing that the processes have aborted and there is nothing in the screen output files or the log file. Similarly, if I allot 2 nodes with 48 cores in total, I can run with the partition of 16 by 3, but not with 24 by 2 or 48 by 1. It seems that the no. of cores per image must exceed the number of nodes specified in the job submission script.
This is the first time I have encountered this issue. Earlier, I was able to run with whatever partition I wanted, but recently, we switched over to the newer lammps version compiled with the intel oneapi. Any help in this regard will be appreciated.
Best wishes,
Amlan Dutta