I am working on downloading the complete dataset for molecules using the Materials Project API and encountered some discrepancies that I hope you can help clarify.
I used the following code (Code 1) to download the dataset:
python
import pandas as pd
from mp_api.client import MPRester
with MPRester(api_key=API_key, monty_decode=False, use_document_model=False) as mpr:
docs = mpr.molecules.summary.search()
df = pd.DataFrame(docs)
df.to_csv('molecules_data.csv')
The downloaded CSV file contains the following fields in the header:
_id, builder_meta, nsites, elements, nelements, composition, composition_reduced, formula_pretty, formula_anonymous, chemsys, volume, density, density_atomic, symmetry, property_name, material_id, deprecated, deprecation_reasons, last_updated, origins, warnings, structure, task_ids, uncorrected_energy_per_atom, energy_per_atom, formation_energy_per_atom, energy_above_hull, is_stable, equilibrium_reaction_energy_per_atom, decomposes_to, xas, grain_boundaries, band_gap, cbm, vbm, efermi, is_gap_direct, is_metal, es_source_calc_id, bandstructure, dos, dos_energy_up, dos_energy_down, is_magnetic, ordering, total_magnetization, total_magnetization_normalized_vol, total_magnetization_normalized_formula_units, num_magnetic_sites, num_unique_magnetic_sites, types_of_magnetic_species, bulk_modulus, shear_modulus, universal_anisotropy, homogeneous_poisson, e_total, e_ionic, e_electronic, n, e_ij_max, weighted_surface_energy_EV_PER_ANG2, weighted_surface_energy, weighted_work_function, surface_anisotropy, shape_factor, has_reconstructed, possible_species, has_props, theoretical, database_IDs
.
However, when I used the following code (Code 2) to check the available fields:
from mp_api.client import MPRester
with MPRester(api_key=API_key, monty_decode=False, use_document_model=False) as mpr:
docs = mpr.molecules.summary.available_fields
print(docs)
The output fields were:
'builder_meta', 'charge', 'spin_multiplicity', 'natoms', 'elements', 'nelements', 'nelectrons', 'composition', 'composition_reduced', 'formula_alphabetical', 'formula_pretty', 'formula_anonymous', 'chemsys', 'symmetry', 'species_hash', 'coord_hash', 'property_name', 'property_id', 'molecule_id', 'deprecated', 'deprecation_reasons', 'level_of_theory', 'solvent', 'lot_solvent', 'last_updated', 'origins', 'warnings', 'molecules', 'molecule_levels_of_theory', 'inchi', 'inchi_key', 'task_ids', 'similar_molecules', 'constituent_molecules', 'unique_calc_types', 'unique_task_types', 'unique_levels_of_theory', 'unique_solvents', 'unique_lot_solvents', 'thermo', 'vibration', 'orbitals', 'partial_charges', 'partial_spins', 'bonding', 'multipole_moments', 'redox', 'metal_binding', 'has_props'
.
Moreover, in the Molecules Explorer API example provided on the website (mpr.molecules.summary.search(molecule_ids=["042b6da7a6eb790fd5038f3729ef715c-C5H8O3-m1-2"])
), the field molecule_ids
is used. However, neither of the above outputs contains the molecule_ids
field.
Above all, How can I download the complete dataset for molecules with api ?