Matminer Miedema not scalable for large datasets

I am trying to create thermochemical features for a large number of quaternary alloys (~ 8 million). The Stoichiometry, YangSolidSolution, and ElementProperty featurizers all run in a reasonable amount of time (~10 mins, ~20mins, and ~2 hours, respectively for 8.1 million rows). However the Miedema featurizer (all struct_types and all ss_types) takes over 24 hours to run. Now, I understand that for quaternaries a number (6) of calculations will have to made to cover each of the binary combos. However, >24 hours still seems like an excessive amount of time for the Miedema calculations.

So, I’m wondering is why is it so slow? Is there something fundamental that makes it take this long? Or could it be that the use of for loops is causing it take significantly longer than “necessary”?

Any insight y’all could provide would be much appreciated. I’d also like to here if anyone has tips for speeding up the feature generation specifically for Miedema calculations. Thanks in advance!

1 Like