This is in the context of materials discovery for superhard materials, a classic problem in materials informatics . A lot of approaches use elastic moduli as a screening tool, but this doesn’t seem very amenable to an adaptive design scheme unless DFT gets incorporated into the workflow (i.e. predict, measure experimental hardness, characterize crystal structure, calculate elastic moduli via DFT, repeat).
The options I’ve found so far:
-
VickersHardnessPrediction
- 529 unique compositions across 1062 entries
- missing list of DOIs/references (https://github.com/ziyan1996/VickersHardnessPrediction/issues/8)
- missing LICENSE (https://github.com/ziyan1996/VickersHardnessPrediction/issues/1)
-
mpds.io
- 531 unique compositions across 1310 (free) entries
- missing applied load information (based on what I saw in the JSON files, https://github.com/mpds-io/mpds-api/issues/37) EDIT: ~450/2400 values in master database according to Pierre Villars
Aside: there are 277 shared (unique) compositions between VickersHardnessPrediction
and MPDS (to get this, I removed duplicates within each set, converted the formulas to composition-based feature vectors via CBFV package, scaled the features to (0,1) via MinMaxScaler()
, rounded to 2 decimals via np.round(decimals=2)
, and dropped duplicates via df.drop_duplicates()
). In other words, there are 783 unique compositions collectively between the two datasets.
Any alternative datasets or other thoughts?