How to extract structure features from cif collection - Question from Jason

"Hi Alex:

… I have a question about matminer recently. In order to extract structure features we need the “structure” column in our datasets right? So what if I only have .cif files available? Do you have some code that can transfer cif into structure? Or we can use cif file directly as “structure” in the dataset? Thank you!

Best,

···

He (Jason) Sun"

Hey Jason,

Yes, you do need a column of pymatgen Structure objects (not cifs) in your dataframe (although it does not need to be named “structure”) before using the Structure Featurizers. Luckily, the .from_file method in pymatgen.Structure converts from cif/POSCAR/etc. to Structure object. Check out the example code below:

import os
from pymatgen import Structure
import pandas as pd
from matminer.featurizers.structure import GlobalSymmetryFeatures
from matminer.featurizers.composition import ElementProperty

# Get the paths of all of the cif files
cif_dir = '/directory/containing/your/cifs'
cif_paths = [os.path.join(cif_dir, cif) for cif in os.listdir(cif_dir)]

# Transform the cif files into Structures and place in a dataframe
structures = [Structure.from_file(cif) for cif in cif_paths]
df = pd.DataFrame({"structure": structures,
                   "composition": [s.composition for s in structures]})

# Apply a structure featurizer
gsf = GlobalSymmetryFeatures()
df = gsf.featurize_dataframe(df, 'structure', ignore_errors=True, return_errors=True)

# Apply a composition featurizer
ep = ElementProperty.from_preset("matminer")
df = ep.featurize_dataframe(df, "composition", ignore_errors=True, return_errors=True)

print(df)

``

Let me know if this works for you!

Thanks,

Alex