Hey Matminer! I have some cif files downloaded from the recent experiment, if I want to obtain the “structure” column as shown in the image below, could you tell me how to get this column of data? Thank you very much for your reply.
I note that CompositionToStructureFromMP() only works if my compositions are ones that in the MP database,
If you have many of these cifs and would like to put them in a dataframe as per your picture, you can use the matminer.featurizers.conversionsPymatgenFunctionApplicator method like I show below (see here for src code: matminer/conversions.py at 7f8520b97175db3c4fc6afe055cee664ebd77238 · hackingmaterials/matminer · GitHub). The requirement is that you have the cif filenames either in a python list or a dataframe column
fileconverter = PymatgenFunctionApplicator(func=Structure.from_file, target_col_id="structure")
# if you want them as a list
# assuming your filenames are in an iterable called cif_filenames
structures = fileconverter.featurize_many(cif_filenames)
# if you want them as a dataframe
# assuming your cif filenames are in a df called "df" under a column name "cif_filenames"
df_with_structures = fileconverter.featurize_dataframe(df, "cif_filenames")
I’ve tried both approaches above that featurize_many and featurize_dataframe,but none moth can batch all the data, Error as shown
Here’s how I do it:(1) I used PymatgenFunctionApplicator.featurize read the local cif file, put them into a list, following the method described above, appears "structure"columns are ‘nan’. I also tried putting the filename in column “cif_filenames”, and the result was the same.
@kaifeng_zhang You seem to already have the data you want in your first cell. After that, it looks like you are trying to from_file the pymatgen structures. If we go through each cell it will be apparent why this is happening:
CELL 1: you apply PymatgenFunctionApplicator to each file in a for loop. The files are converted into pmg structures and appended to the s list.
CELL 2: You make a df from the structures. This is actually already the data you want.
CELL 3: You rename the column so that your structures are under the name “cif_filename”. But this is actually NOT cif filenames, these are your actual structures. Then you rerun the featurizer using the cif_filenames column (but really they are structures) as input which results in nan because these are obviously not CIFs. With ignore_errors=True, this shows no error messages.
CELL 4: You do the same thing as cell 3 but without dataframes. Again, this results in a bunch of nans.
Here’s an example of reading and using pymatgen function applicator to do the same thing with 3 test CIF files I had:
Hi!
I want to do the inverse! I have a dataframe with a column containing structure features, These materials do not have any cife files, and I want to read this column with pymatgen directly by the IStructure function without creating cife files to use in ACSF function. The format of the structure is the same poscar but the number of lines is different.