Any interest in incorporating chemical composition distance metric (ElM2D) into matminer?

Recently, I’ve had some success using the Element Mover’s Distance (ElMD – I’m not the owner) in a materials discovery project (mat_discover – I am the owner), but I can’t help but feel that it isn’t receiving the usage that’s warranted for the usefulness of the tool. In particular, I’ve really enjoyed the ability of ElM2D to create chemically homogeneous clusters based on chemical formulae only. I’m wondering if ElMD is something that would be of interest to incorporate into matminer, or if you have thoughts for another place where people are more likely to become aware of it. Here’s the paper (I am not an author):

Hargreaves, C. J.; Dyer, M. S.; Gaultois, M. W.; Kurlin, V. A.; Rosseinsky, M. J. The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions. Chem. Mater. 2020, 32 (24), 10610–10620. https://doi.org/10.1021/acs.chemmater.0c03381.

ElM2D is the “2D” distance matrix version of the code.

Hey @sgbaird ! We welcome PRs on the matminer repo. I think ElM2D would be something interesting to see on there. It would probably be best if there could be a minimal featurizer contained entirely within matminer (not needing access to external libraries) but we can include external libraries if needed. Let me know if you need help!

1 Like

Hi Alex, thanks for the quick reply! Numba (GitHub) is probably the only dependency that might be best to keep as an external library. Any others can probably be removed or incorporated for a matminer version. Technically the Numba dependency can be removed as well, but the code would be much slower (50-200x?). If Numba is a bit too hefty as an external dep, I’ll think about other options.

I took another look at the matminer contributing docs. Does ElM2D being under a GPL license pose a problem? If so, a re-license might work pending the owner’s approval.

Let me tag @computron here but I don’t think a GPL license should be a big deal? Also feel free to just open the PR and we can adjust as needed re: licensing things.

Re: Numba: I think having numba as a dependency wouldn’t be a big problem. In fact, this is something I’ve wanted to have in matminer for a while anyway but haven’t gotten around to. It’d be good to have a first example of how to do it in a featurizer/class, so EIM2D might be a good place for us to prototype that together.

But in general I don’t forsee any major problems whether we include numba as a core dependency or as an optional dependency. I believe we can work around it either way. I’d recommend opening a PR and we can iterate on it together.

1 Like

Awesome, just wanted to double-check some things before getting too ahead of myself. I’ll plan on opening a PR. Thanks!

Opened a draft PR and I’m realizing after a few hours of refactoring, it might make more sense to go with it as an external, optional dependency in the short-term. This is based on the amount of work and the time I can justify spending towards this (I realize I originally suggested this :sweat_smile:). No hard feelings if the external dep option is a no-go or the PR gets closed/left in draft state.

1 Like