Creating an accessible dataframe from MP-API query results

I am trying to convert the resulting query into a pandas dataframe as shown below, but format of the resulting dataframe looks off with the field name for fields I queried being in the tuples on each row instead of a being the title for each column Is there a simple way to get a dataframe like the last picture I have attached programmatically?

thanks in advance,

docs =["B"], exclude_elements=["N","C","Li","Ca","Mg","Na","K","Rb","Cs","Fr","Be","Sr","Ba","Ra","H","O","P","F","Cl"]     
                           ,fields=["formula_pretty", "density", "volume", "symmetry","material_id"])

df = pd.DataFrame(docs)

Result of the displayed dataframe:

I think Dataframe.from_records() should work.

Hey @tschaume ,I tried this and got the same result, do I need us use some of the method parameters or combine this with another method?

Hi @J0chin,

here’s one way to go about it. It probably isn’t the most efficient and @munrojm might have better suggestions.

from flatten_dict import flatten

flattened = [{
    k: v for k, v in flatten(doc.dict(), reducer="dot").items() if k != "fields_not_requested"
} for doc in docs]





I am having the exact same issue. I have tried a variety of pandas methods (str.replace, str.rstrip, str.lstrip lambda: x, …) but they completely take out all data in that column (resulting in NaN), including the numerical values I want. I’m wondering if this isn’t working because the data in the columns aren’t strings?

Additionally, @tschaume 's solution does not work for my dataset

@kikin you’re trying to flatten a Dataframe whereas flatten takes a dictionary. Loop over the results from (docs in my example) and not the rows of a dataframe.

1 Like

Thanks for the useful code snippet!
I usually save the query results to a csv file and in that case I add the following line before exporting to csv. This makes it easier to load the structures later with Structure.from_dict or Structure.from_str.

data_df = DataFrame.from_records(flattened)

data_df['structure'] = data_df['structure'].apply(lambda x:'json'))


1 Like

I have solved this issue by:

fields = [
    "material_id", "band_gap", "volume", 'formula_pretty', 'is_stable',
with MPRester(apiid) as mpr:
    docs ="Si-O",
                                        band_gap=(0, 10),
 flattened = [{
      k: v
      for k, v in flatten(doc.dict(), reducer="dot").items()
      if k != "fields_not_requested"
  } for doc in docs]
 df = pd.DataFrame.from_records(flattened)
 df = df.drop(columns=[col for col in df if col not in fields])

But, IMHO, the problem can be easily solved if upstream considers not to write fileds_not_requested every time.