Hey Matminer! I have some cif files downloaded from the recent experiment, if I want to obtain the “structure” column as shown in the image below, could you tell me how to get this column of data? Thank you very much for your reply.
I note that CompositionToStructureFromMP() only works if my compositions are ones that in the MP database,
If you have many of these cifs and would like to put them in a dataframe as per your picture, you can use the matminer.featurizers.conversionsPymatgenFunctionApplicator method like I show below (see here for src code: matminer/conversions.py at 7f8520b97175db3c4fc6afe055cee664ebd77238 · hackingmaterials/matminer · GitHub). The requirement is that you have the cif filenames either in a python list or a dataframe column
fileconverter = PymatgenFunctionApplicator(func=Structure.from_file, target_col_id="structure")
# if you want them as a list
# assuming your filenames are in an iterable called cif_filenames
structures = fileconverter.featurize_many(cif_filenames)
# if you want them as a dataframe
# assuming your cif filenames are in a df called "df" under a column name "cif_filenames"
df_with_structures = fileconverter.featurize_dataframe(df, "cif_filenames")
I’ve tried both approaches above that featurize_many and featurize_dataframe,but none moth can batch all the data, Error as shown
Here’s how I do it:(1) I used PymatgenFunctionApplicator.featurize read the local cif file, put them into a list, following the method described above, appears "structure"columns are ‘nan’. I also tried putting the filename in column “cif_filenames”, and the result was the same.
@kaifeng_zhang You seem to already have the data you want in your first cell. After that, it looks like you are trying to from_file the pymatgen structures. If we go through each cell it will be apparent why this is happening:
CELL 1: you apply PymatgenFunctionApplicator to each file in a for loop. The files are converted into pmg structures and appended to the s list.
CELL 2: You make a df from the structures. This is actually already the data you want.
CELL 3: You rename the column so that your structures are under the name “cif_filename”. But this is actually NOT cif filenames, these are your actual structures. Then you rerun the featurizer using the cif_filenames column (but really they are structures) as input which results in nan because these are obviously not CIFs. With ignore_errors=True, this shows no error messages.
CELL 4: You do the same thing as cell 3 but without dataframes. Again, this results in a bunch of nans.
Here’s an example of reading and using pymatgen function applicator to do the same thing with 3 test CIF files I had: