Related Materials Calculation

Hi everyone,

I opened mp-1968 and its related materials, the following link #1 shows that mp-1968 and mp-2063 have a 98.02% similar.
#1: https://next-gen.materialsproject.org/materials/mp-1968?chemsys=La-O#related_materials

However, when I open the related materials method document, shown in the link #2. The similarity output of the method is a distance number not a percentage.
#2: Related Materials | Materials Project Documentation

How does the percentage come from in link #1?

Thanks!

Hey @Yaoqi, the percentages are e^{-|| v_i - v_j||} (which ranges from 0 as completely dissimilar to 1 as identical), see Eq. 11 of the similarity paper used to generate this data.

Maybe @Anubhav_Jain can confirm that it’s the exponential distance metric and not the cosine similarity

Thank you so much, these are really helpful!

I calculate the similarity between mp-2063 and mp-1968 using the following commands:

  ssf = SiteStatsFingerprint(
      CrystalNNFingerprint.from_preset('ops', distance_cutoffs=None, x_diff_weight=0),
      stats=('mean', 'std_dev', 'minimum', 'maximum')
      )

structure = mpr.get_structure_by_material_id("mp-2063")
fp_1 = np.array(ssf.featurize(structure))


struc = mpr.get_structure_by_material_id("mp-1968")
fp_2 = np.array(ssf.featurize(struc))

dist = np.linalg.norm(fp_1 - fp_2)
print(math.exp(-dist))

The output is 0.987350117939376, which is not quite the same with the similarity shown in the materials project page: 98.02%.

I am wondering if you know this small difference comes from.

Thank you!

Sorry for the delay. Apart from a small tweak (our docs don’t reflect the manuscript’s choice of kwargs for SiteStatsFingerprint), the differences you’re seeing are probably because the structures which compose a material have changed over time, but the similarity scores have not been updated

Here’s the corrected code which I’ll work into our documentation:

from matminer.featurizers.structure.sites import SiteStatsFingerprint
from matminer.featurizers.site.fingerprint import CrystalNNFingerprint
import numpy as np

bva = BVAnalyzer()

def get_similarity(structure_1, structure_2):
    
    fingerprinter = SiteStatsFingerprint(
        CrystalNNFingerprint.from_preset(
            "ops",
            distance_cutoffs=None,
            x_diff_weight=None,
        ),
        stats = ("mean","maximum",)
    )

    feature_vectors = [
        np.array(fingerprinter.featurize(structure))
        for structure in (structure_1, structure_2)
    ]
    dist = np.linalg.norm(feature_vectors[1] - feature_vectors[0])
    return 100*np.exp(-dist)
1 Like