What I have been doing
Hi!
I am trying to download part of the DB information using the API via the nomad python package: namely some basic pieces of information for every bulk material present, according to the following code:
query = ArchiveQuery(
query={
'domain': 'dft',
'dft.system': 'bulk',
},
required = {
'section_run':{
'section_system':{
'atom_species':'*'
}
},
'section_metadata': {
'encyclopedia': {
'material': {
'formula': '*',
'bulk': {
"bravais_lattice": "*",
"crystal_system": "*",
"point_group": "*",
"space_group_number": '*',
"space_group_international_short_symbol": "*",
"structure_prototype": "*",
"structure_type": "*",
},
'idealized_structure': {
"atom_labels": '*',
"atom_positions": '*',
"number_of_atoms": '*',
"cell_volume": '*',
"lattice_parameters": {
"a": '*',
"b": '*',
"c": '*',
"alpha": '*',
"beta": '*',
"gamma": '*',
},
},
},
'properties': {
'atomic_density': '*',
'mass_density': '*',
'energies': '*',
},
'method': {
'functional_type': '*',
'functional_long_name': '*',
},
},
},
},
per_page = 100,
max = None,
)
printing such a query results in a number of queried entries of 935760.(Q1)
Now, in order to double check the number of effectively fetched materials, I attached the following simple code to the previous one:
l =[]
for result in query:
formula = result.section_metadata.encyclopedia.material.formula,
l.append(formula)
print(len(l))
expecting a final printed value corresponding to the number of queried entries. (Q2)
This was not the case, since the script encountered the error reported in the title (AttributeError ‘NoneType’ object has no attribute ‘encyclopedia’ (Q3)).
Questions
Q1: Why is the number of queried entries so “low”? I guess it is because of the numerous constrains I put in the required
attribute of ArchiveQuery, but I could not verify this with the NOMAD GUI research, since I could not find a way to insert such an attribute in the query.
Q2: is the number of queried entries equal to the number of material phases present in the DB? I expect the list l
to contain also doubles of the same formulas, corresponding to different phases: is this correct?
Q3: It appears that ArchiveQuery fetches objects without the attribute ‘encyclopedia’: why so? Are there some materials with an encyclopedia entry not completed yet, or is it something deeper?
Q4: I am planning on saving locally the fetched info in two pandas dataframes:
- a simple one containing all the "monodimensional’ pieces of info, like formula, point group, total energy, lattice vectors and angles etc
- a python dictionary with keys as formula_crystal_system (hopefully unique for every DB entry and easy to identify by the user) and values as pandas dataframes structured as follows:
atom x y z
A (vector of atom positions from idealised structure, i.e. multiples of the lattice vectors)
A (same)
B (same)
C (same)
in order to allow an arbitrary number of atoms for each key, hence flexibility in storing the data. Does that sound reasonable, or would you suggest anything else? The final application for this will be training generative ML algorithms like GANs.
Thanks a lot,
Antonio