Long time for rocket to finish with many dependencies in workflow

I have a workflow with a few fireworks that are dependent on a lot of other fireworks to finish (this number is variable but can be around 1000 or more dependencies for our largest workflows). It takes a long time for rockets to finish after the task has completed with the following times (40 sec) for 500 dependencies:

2018-07-27 18:21:16,209 INFO Task completed

2018-07-27 18:21:56,454 INFO Rocket finished

This can be a problem when multiple fireworks finish around the same time, which leads to locks. I think the issue is coming from refresh() inside the Links class in core/firework.py. I might have a fix and test case to show an improvement in the time it takes the rocket to finish. Would you like me to submit a PR or do you have any other suggestions to avoid an issue like this?

Thanks

Travis

Hi Travis,

I don’t have a lot of experience with that many dependencies since the workflows I run tend to be more simple. I would expect the locking to be an issue only when many jobs (Fireworks) in the same workflow are finishing at roughly the same time; however, if this were the case, I can certainly see how a 40 second delay would occur.

It would be great if you have a solution to this. A PR would be the best way to get this integrated into the main branch and also be a place where we can discuss specifics of your solution.

Best,

Anubhav

···

On Tuesday, July 31, 2018 at 9:25:48 AM UTC-7, Travis H wrote:

I have a workflow with a few fireworks that are dependent on a lot of other fireworks to finish (this number is variable but can be around 1000 or more dependencies for our largest workflows). It takes a long time for rockets to finish after the task has completed with the following times (40 sec) for 500 dependencies:

2018-07-27 18:21:16,209 INFO Task completed

2018-07-27 18:21:56,454 INFO Rocket finished

This can be a problem when multiple fireworks finish around the same time, which leads to locks. I think the issue is coming from refresh() inside the Links class in core/firework.py. I might have a fix and test case to show an improvement in the time it takes the rocket to finish. Would you like me to submit a PR or do you have any other suggestions to avoid an issue like this?

Thanks

Travis