I was trying to find the structure of a compound (Ta2NiSe5) using the API function materials.search(), and queried both ‘structure’ and ‘initial_structures’ as fields.
I saw that the ‘structure’ field returned the structure displayed on the interactive website, while ‘initial_structures’ returned a list of structures which didn’t contain the structure returned in the first case.
Could anyone help me understand the nature of these structures that ‘initial_structures’ is returning? (are they unrelaxed, unstable etc.). I really appreciate the help.
The initial_structure contains the unique starting structures used in all calculations (tasks) that go into building a material on MP.
Looking at a materials document:
from mp_api.client import MPRester
with MPRester() as mpr:
mat_doc = mpr.materials.search(material_ids=['mp-149'])[0]
print(len(mat_doc.task_ids))
>>> 42 # will vary over time
So we see that 42 individual calculations / “tasks” build up the material mp-149.
If we then query the tasks for those materials, and de-duplicate them:
from pymatgen.analysis.structure_matcher import StructureMatcher
with MPRester() as mpr:
tasks = mpr.materials.tasks.search(task_ids=mat_doc.task_ids, fields=["input"])
sm = StructureMatcher(
ltol=0.1, stol=0.1, angle_tol=0.1, scale=False, attempt_supercell=False
)
unique_structure_groups = sm.group_structures(
[task.input.structure for task in tasks if task.input and task.input.structure])
initial_structures = [group[0] for group in unique_structure_groups]
print(len(initial_structures))
>>> 6 # will vary depending on numpy + spglib version
print(len(mat_doc.initial_structures))
>>> 5 # will vary by database version
Ideally those two numbers at the end would be the same, but can differ depending on the numpy, spglib, etc. versions used in building data
Thank you for your response Aaron!
So, as I mentioned in my original post, I was looking at Ta2NiSe5: mp-541070. When I queried the ‘initial_structures’ field, it returned 3 structures. However, the third structure surprisingly had a different lattice symmetry from the first 2 structures - it was slightly monoclinic while the others were orthorhombic.
From what I could tell, Materials Projects usually presents different-symmetry lattices for the same compound in separate materials project entries. Am I correct in understanding that, in my case, the third structure was also classified as orthorhombic due to the thresholds used in the StructureMatcher class in the code you presented?
Correct, the symprec (linear tolerance for identifying structures) used to generate Materials Project data is looser than the default in pymatgen:
from mp_api.client import MPRester
with MPRester() as mpr:
init_structs = mpr.get_structure_by_material_id("mp-541070",final=False)
print({s.get_space_group_info(symprec=0.1) for s in init_structs})
>>> {('Cmcm', 63)}
So all of the structures, including the one that is borderline another space group, are classified as Cmcm in MP’s data.