A faster version or an alternative to pymatgen's StructureMatcher

I’ve been using StructureMatcher, which is great but also quite slow. It will take ~40 hours to compute metrics for a set of benchmark tasks in matbench-genmetrics which would make it difficult to optimize model hyperparameters based on these performance metrics. See cheaper matching · Issue #9 · sparks-baird/matbench-genmetrics · GitHub for some discussion related to this.

For the metrics I’m computing (coverage, novelty, uniqueness), I’ll need to do ~6e7 pairwise comparisons.

Right now, this is the main block to me doing hyperparameter optimization of the xtal2png representation in conjunction with a diffusion generative model.

2 Likes

Thanks @sgbaird, will comment in the issue – would love to see movement on this, I’m sure we can do better.

1 Like

Thanks @mkhorton. Also figured it might be good for me to summarize some of the literature I’m aware of related to crystal structure distance metrics and matching:

(1)

Hicks, D.; Toher, C.; Ford, D. C.; Rose, F.; Santo, C. D.; Levy, O.; Mehl, M. J.; Curtarolo, S. AFLOW-XtalFinder: A Reliable Choice to Identify Crystalline Prototypes. npj Comput Mater 2021, 7 (1), 30. AFLOW-XtalFinder: a reliable choice to identify crystalline prototypes | npj Computational Materials.

(2)

Pan, H.; Ganose, A. M.; Horton, M.; Aykol, M.; Persson, K. A.; Zimmermann, N. E. R.; Jain, A. Benchmarking Coordination Number Prediction Algorithms on Inorganic Crystal Structures. Inorg. Chem. 2021, 60 (3), 1590–1603. https://doi.org/10.1021/acs.inorgchem.0c02996.

(3)

Thomas, J. C.; Natarajan, A. R.; Van der Ven, A. Comparing Crystal Structures with Symmetry and Geometry. npj Comput Mater 2021, 7 (1), 164. Comparing crystal structures with symmetry and geometry | npj Computational Materials.

(4)

Takemura, S.; Takeda, T.; Nakanishi, T.; Koyama, Y.; Ikeno, H.; Hirosaki, N. Dissimilarity Measure of Local Structure in Inorganic Crystals Using Wasserstein Distance to Search for Novel Phosphors. Science and Technology of Advanced Materials 2021, 22 (1), 185–193. https://doi.org/10.1080/14686996.2021.1899555.

(5)

Zhang, R.; Seth, S.; Cumby, J. Grouped Representation of Interatomic Distances as a Similarity Measure for Crystal Structures; preprint; Chemistry, 2022. Grouped representation of interatomic distances as a similarity measure for crystal structures | Materials Chemistry | ChemRxiv | Cambridge Open Engage.

(6)

Therrien, F.; Graf, P.; Stevanović, V. Matching Crystal Structures Atom-to-Atom. J. Chem. Phys. 2020, 152 (7), 074106. Cookie Absent.

(7)

Veremyev, A.; Liyanage, L.; Fornari, M.; Boginski, V.; Curtarolo, S.; Butenko, S.; Buongiorno Nardelli, M. Networks of Materials: Construction and Structural Analysis. AIChE J 2021, 67 (3). https://doi.org/10.1002/aic.17051.

(8)

Ganose, A. M.; Jain, A. Robocrystallographer: Automated Crystal Structure Text Descriptions and Analysis. MRS Communications 2019, 9 (3), 874–881. https://doi.org/10.1557/mrc.2019.94.

(9)

Jang, J.; Gu, G. H.; Noh, J.; Kim, J.; Jung, Y. Structure-Based Synthesizability Prediction of Crystals Using Partially Supervised Learning. J. Am. Chem. Soc. 2020, 142 (44), 18836–18843. https://doi.org/10.1021/jacs.0c07384.

2 Likes

This is a really nice list. There are a few more too – one approach I’ve been interested in, but have not pursued seriously, is doing an initial matching based on the lattice alone.

Two approaches I’m curious about:

Larsen, Peter M., et al. “Minimum-strain symmetrization of Bravais lattices.” Physical Review Research 2.1 (2020): 013077.

Andrews, Lawrence C., Herbert J. Bernstein, and Nicholas K. Sauter. “A space for lattice representation and clustering.” Acta Crystallographica Section A: Foundations and Advances 75.3 (2019): 593-599.

These are both really cool! I’ve been thinking about ways we might adopt some of these methods for Materials Project for a while, and definitely interested to see how others might use them.

Are you just testing if a structure is unique? Sometimes things as simple as pair-wise distances can work quite well, e.g. computing a sorted list of pair-wise distances up to certain number of nearest neighbours or cut off radius. Then you can compare the euclidean distance of such two vectors. Such signature of the materials can be pre-computed only once, and compute the distance is cheap, whereas StructureMatcher is truely pair-wise comparison.

There is also this recent paper on using average minimum distances: match87n3

1 Like

The metrics are based on the number of matches between sets of structures. That’s a good idea about comparing sorted pairwise distances, maybe with a cutoff value and as a pre-screening. Will check out the paper, too. Thanks for the suggestions!

1 Like