Hi Steve and everyone.
I’m trying to run a system with a very large number of bonds (essentially every pairwise interaction also has a corresponding bond). There are about 60 million bonds initially. I should have enough memory on my worker nodes to handle this judging by scaling up smaller runs which were successful. But for this system size, it seems like the process on the zero-th node starts thrashing (memory use -> 100% CPU use -> 10%) before I get any message about memory per processor or any other useful startup info.
Is the process of initially communicating bond information to the worker nodes different than communicating atom information to the worker nodes? Does it present a bottleneck to scaling? I’ll take a look at the code but thought someone might know offhand.