How to retrieve Structure from composition

I am trying to run CGCNN and hence want to extract Structure and property from the benchmark datasets. I am using the following code snippet:

data = load_dataframe_from_json(dataset_file_name)
for index, row in data.iterrows():
struct = row.structure
But there are few datasets like ‘matbench_glass’,‘matbench_expt_gap’ etc. where I am not able to fetch the structure rather composition is retrieved.
Is there any way to get the structures of these datasets?
Is there any way to get structure from the composition itself?


In general

Some datasets do not have structure and they are not strictly applicable to models like CGCNN as the primitive structures (and resulting bond graphs) are potentially enormous. For example, the matbench_glass and matbench_steels datasets entries do not exactly have known structure.

Some other datasets may have known structure but either

(a) The structure is known and corresponds to an entry in databases like the Materials Project. In this case structure was not accounted for by the authors of the original study (i.e., because they are unsure of the actual structure which corresponds to the property which was measured).

(b) The structure was reported in the original study but does not have an entry in any major repository. I.e., no one took the time to make an entry, thus looking up these structures is considerably more difficult.

For the case of (b) or no known structure, you are basically on your own in trying to figure out exactly which structure corresponds to a given entry.

For the case of (a) you can try the CompositionToStructureFromMP featurizer
from matminer - see this page

Your specific case

For the dataset matbench_expt_gap @rkingsbury has gone through the trouble of correlating compositions to mpids (from which you can get structures) for each entry in the new dataset kingsbury_expt_gap (see all datasets here). Note you should not use this dataset to make a submission to matbench.