Remote rlaunch multi noticeably slowed down


I’ve been using fireworks for quite some time now to perform high-throughput ab-initio calculations on a HPC cluster.
To use the resources most efficiently, my my_qadapter.yaml file makes use of the rlaunch multi functionality of fireworks.

While there are no issues or errors with the functionality itself, I noticed that - using this setup - often the ab-initio calculations (e.g a geometry optimizations) are slowed down compared to the equivalent calculations performed on a separate node without fireworks (i.e via a batch script).
It should be noted that the Fireworks, i.e the geometry-optimizations, are not in any way dependent on each other.

For example when I use 6 nodes (à 72 cores) with rlaunch multi 6 (i.e every geometry-optimization gets 72 cores), a particular calculation can take significantly longer than the same geometry-optimization performed in a separate 1-node-batch-job.

My cluster uses SLURM.

At first I believed this may be due to excessive inter-node communication as the 72 cores per geometry-optimization in the fireworks run are not necessarily located on the same node.
Hence, I tried to correct that by adding -N 1 --exclusive to the srun command fireworks executes.
This, however, had no noticeable effect.

Hence, my question:
Is it normal that a Firework is slowed down because of the multi-functionality?
If so, are there any known remedies to correct that, either for SLURM or in general?