Issues with spacegroup 71 calculations with mp-ids `mp-109XXXX`

A few years ago I noticed an issue with a wren model I tried on MP looking at spacegroup 71 but didn’t have any particular way to attribute the poor predictive performance to actual data points in MP:

Today I was looking at the alexandria dataset and MP to compare to what extent they overlap and noticed that when comparing the vasp energy per atom for the structures in alexandria that have have a aflow-style protostructure label found in MP there was a distinct anomalous population. Further investigation exposed that they were almost entirely in spacegroup 71 and had mp-109XXXX mp-ids. outside this there are a few random errors between MP and alexandria but this is the dominant systematic issue.

I think that this mp-109XXXX should be investigated further and potentially dropped from future db releases if found to be inconsistent.

2 Likes

Thanks @CompRhys!

There are a lot of possibilities for computational / methodological differences which might lead to this discrepancy, but it’s obviously suspect given the consistency of the space group / A_2 B C structure prototype of these.

I’ll try rerunning a few to see if there are marked changes in the total energies. In the meantime, would you mind sending a dict of Alexandria total energies with MPIDs they’re matched to?

Also for clarity, I’m turning up 2,116 possibly problematic structures that match your criteria with this query:

from mp_api.client import MPRester

with MPRester() as mpr:
    summary_docs = mpr.materials.summary.search(spacegroup_number=71,fields=["material_id","origins"])

    to_check = []
    for doc in summary_docs:
        if (mpid := str(doc.material_id)).startswith("mp-") and mpid.split("mp-")[-1].startswith("109"):
            for entry in doc.origins:
                if entry.name == "structure":
                    to_check.append(str(entry.task_id))
                    break

    task_docs = mpr.materials.tasks.search(task_ids=to_check)
    
task_docs = [
    task for task in task_docs
    if task.output.structure.get_space_group_info()[1] == 71
]

Let me know if those IDs (also attached) match for you.
suspect_task_ids.json (28.9 KB)

1 Like

eda_mp_wbm_alex.py (4.9 KB)
alex_mp_final_struct_aflow_prototypes.csv.xz (2.0 MB)

Here are the scripts I was using to analyze, in order to be able to upload here I have uploaded just the overlapping alex entries that duplicate protostructures in MP all other files are downloaded from mbd on the fly if needed. The first df_comb on line 61 will have the MP vasp energies and alexandria equivalents in the same df.

Thanks for looking at this, I’m quite satisfied with finding this during eda as I have really not been able to attribute the above error to anything concrete before today.

Additionally mp-1247730, mp-1248716, mp-1255731, mp-1258701, mp-1259896, mp-1264151, and mp-1262938 look suspect to me for similar discrepancies

These id’s are potentially also worth looking at based on similar divergences but can’t really spot anything systematic

['mp-1006115',
 'mp-1046037',
 'mp-1066',
 'mp-1066771',
 'mp-1071438',
 'mp-1072579',
 'mp-1076286',
 'mp-1076923',
 'mp-1077305',
 'mp-1077403',
 'mp-1077617',
 'mp-1078456',
 'mp-1079041',
 'mp-1079338',
 'mp-1079563',
 'mp-1079782',
 'mp-1080576',
 'mp-1080840',
 'mp-1084754',
 'mp-1094902',
 'mp-1096246',
 'mp-1096312',
 'mp-1096751',
 'mp-1102209',
 'mp-1102353',
 'mp-1102385',
 'mp-1102535',
 'mp-1102751',
 'mp-1102920',
 'mp-1103074',
 'mp-1104405',
 'mp-1105011',
 'mp-1105430',
 'mp-1105578',
 'mp-1105616',
 'mp-1105650',
 'mp-1178650',
 'mp-1178654',
 'mp-1178963',
 'mp-1179239',
 'mp-1179506',
 'mp-1179622',
 'mp-1179649',
 'mp-1179733',
 'mp-1179833',
 'mp-1179990',
 'mp-1181307',
 'mp-1181342',
 'mp-1181556',
 'mp-1181670',
 'mp-1181833',
 'mp-1182363',
 'mp-1182706',
 'mp-1182806',
 'mp-1184489',
 'mp-1184513',
 'mp-1184716',
 'mp-1185472',
 'mp-1185480',
 'mp-1185587',
 'mp-1185609',
 'mp-1185611',
 'mp-1185783',
 'mp-1186796',
 'mp-1186929',
 'mp-1187075',
 'mp-1187833',
 'mp-1188245',
 'mp-1188638',
 'mp-1189473',
 'mp-1190896',
 'mp-1190933',
 'mp-1191162',
 'mp-1191193',
 'mp-1191764',
 'mp-1191949',
 'mp-1192293',
 'mp-1192449',
 'mp-1192608',
 'mp-1192665',
 'mp-1192781',
 'mp-1192972',
 'mp-1193274',
 'mp-1193565',
 'mp-1193924',
 'mp-1194072',
 'mp-1194245',
 'mp-1194558',
 'mp-1194710',
 'mp-1201305',
 'mp-1204701',
 'mp-1205943',
 'mp-1206073',
 'mp-1206079',
 'mp-1206109',
 'mp-1206156',
 'mp-1206190',
 'mp-1206232',
 'mp-1206253',
 'mp-1206275',
 'mp-1206317',
 'mp-1206328',
 'mp-1206336',
 'mp-1206377',
 'mp-1206385',
 'mp-1206390',
 'mp-1206433',
 'mp-1206459',
 'mp-1206475',
 'mp-1206575',
 'mp-1206696',
 'mp-1206726',
 'mp-1206740',
 'mp-1206835',
 'mp-1206869',
 'mp-1206888',
 'mp-1206934',
 'mp-1206939',
 'mp-1206950',
 'mp-1206966',
 'mp-1207047',
 'mp-1207051',
 'mp-1207133',
 'mp-1207258',
 'mp-1207266',
 'mp-1207280',
 'mp-1207296',
 'mp-1207306',
 'mp-1207310',
 'mp-1207311',
 'mp-1207313',
 'mp-1207314',
 'mp-1207315',
 'mp-1207322',
 'mp-1207329',
 'mp-1207340',
 'mp-1207368',
 'mp-1207991',
 'mp-1208078',
 'mp-1208306',
 'mp-1208309',
 'mp-1208313',
 'mp-1208315',
 'mp-1208342',
 'mp-1208426',
 'mp-1208624',
 'mp-1208775',
 'mp-1209146',
 'mp-1209898',
 'mp-1209992',
 'mp-1210245',
 'mp-1210439',
 'mp-1210706',
 'mp-1210906',
 'mp-1210939',
 'mp-1211109',
 'mp-1211305',
 'mp-1211341',
 'mp-1211445',
 'mp-1211453',
 'mp-1211623',
 'mp-1212042',
 'mp-1212055',
 'mp-1212104',
 'mp-1212274',
 'mp-1212351',
 'mp-1212378',
 'mp-1212929',
 'mp-1213020',
 'mp-1213329',
 'mp-1213433',
 'mp-1213437',
 'mp-1213556',
 'mp-1213623',
 'mp-1213831',
 'mp-1214061',
 'mp-1214268',
 'mp-1214460',
 'mp-1214610',
 'mp-1214861',
 'mp-1214985',
 'mp-1215073',
 'mp-1215080',
 'mp-1215084',
 'mp-1215087',
 'mp-1215157',
 'mp-1215925',
 'mp-1219657',
 'mp-1219827',
 'mp-1219839',
 'mp-1221574',
 'mp-1221614',
 'mp-1221882',
 'mp-1222042',
 'mp-1222217',
 'mp-1224215',
 'mp-1224523',
 'mp-1224726',
 'mp-1225074',
 'mp-1225348',
 'mp-1229022',
 'mp-1238763',
 'mp-1245761',
 'mp-1247730',
 'mp-1247838',
 'mp-1247881',
 'mp-1248716',
 'mp-1251411',
 'mp-1255731',
 'mp-1258701',
 'mp-1259896',
 'mp-1262938',
 'mp-1264151',
 'mp-1330336',
 'mp-1337078',
 'mp-1337126',
 'mp-1339913',
 'mp-1347973',
 'mp-1361141',
 'mp-1371261',
 'mp-1371686',
 'mp-1372590',
 'mp-1376614',
 'mp-1376797',
 'mp-1378207',
 'mp-1381342',
 'mp-1385813',
 'mp-1386638',
 'mp-1386900',
 'mp-1391114',
 'mp-1392021',
 'mp-1392145',
 'mp-1393434',
 'mp-1393717',
 'mp-1399051',
 'mp-1400737',
 'mp-1404622',
 'mp-1404786',
 'mp-1405076',
 'mp-1405864',
 'mp-1406711',
 'mp-1406815',
 'mp-1414055',
 'mp-1416168',
 'mp-1425089',
 'mp-1444833',
 'mp-1445086',
 'mp-1517779',
 'mp-1517791',
 'mp-1518619',
 'mp-1519303',
 'mp-1519404',
 'mp-1520433',
 'mp-1521374',
 'mp-1522661',
 'mp-19485',
 'mp-20448',
 'mp-20644',
 'mp-20700',
 'mp-21383',
 'mp-21458',
 'mp-2272170',
 'mp-2279',
 'mp-23907',
 'mp-27884',
 'mp-2814',
 'mp-2840',
 'mp-37537',
 'mp-510140',
 'mp-510613',
 'mp-541126',
 'mp-542606',
 'mp-542621',
 'mp-542633',
 'mp-568714',
 'mp-574507',
 'mp-582399',
 'mp-582464',
 'mp-607826',
 'mp-640044',
 'mp-640458',
 'mp-647604',
 'mp-671957',
 'mp-672441',
 'mp-673165',
 'mp-673954',
 'mp-674158',
 'mp-675106',
 'mp-675126',
 'mp-675871',
 'mp-675893',
 'mp-675946',
 'mp-676985',
 'mp-677135',
 'mp-677260',
 'mp-677510',
 'mp-689577',
 'mp-694615',
 'mp-722045',
 'mp-724868',
 'mp-725071',
 'mp-725133',
 'mp-725152',
 'mp-725918',
 'mp-726838',
 'mp-727231',
 'mp-728349',
 'mp-728567',
 'mp-729608',
 'mp-730082',
 'mp-730563',
 'mp-731625',
 'mp-753915',
 'mp-756644',
 'mp-756779',
 'mp-763017',
 'mp-770668',
 'mp-778269',
 'mp-780121',
 'mp-781020',
 'mp-781462',
 'mp-787551',
 'mp-795937',
 'mp-796795',
 'mp-798767',
 'mp-799684',
 'mp-802090',
 'mp-802184',
 'mp-806032',
 'mp-806082',
 'mp-817993',
 'mp-820801',
 'mp-850880',
 'mp-861726',
 'mp-861952',
 'mp-862739',
 'mp-865177',
 'mp-866089',
 'mp-886266',
 'mp-888449',
 'mp-888651',
 'mp-908109',
 'mp-973155',
 'mp-976591',
 'mp-977365',
 'mp-978096',
 'mp-989467',
 'mp-989699',
 'mvc-11882']
1 Like

Hi @CompRhys - I’m not finding systematic discrepancies between the task docs on MP and my rerun static calcs. On average, the total energies per atom differ by:

  • signed: -5.4 meV/atom
  • unsigned: 22.9 meV/atom

So obviously some differences but nothing systematic. Likewise, the standard deviations between old and new calcs of the total energies is 67.5 meV/atom. I can trace some of the larger energy differences to different final magnetic configurations

Worth noting I ran these 1,767 MPIDs:
(df_comb["material_id_mp"].str.startswith("mp-109")) & (df_comb["spg_mp"]==71)

Hard for me to say why Alexandria’s energies differ so much for these without seeing their VASP inputs

2 Likes

thanks for checking! I’ll continue to look at them and report if I can trace the source of the discrepancy

Just to be clear: the structures are identical between Alexandria and MP right? If the Alexandria set contains re-relaxed structures (starting from MP initial or final structures), then there’s a possibility for a different relaxation trajectory. This becomes likelier if there are pseudopotential discrepancies.

I’ll look into this more since it’s quite suspicious that there’s such a marked discrepancy

alex-examples.json (53.1 KB)
mp-matching-examples.json (8.0 KB)

So here are 3 corresponding pairs for MP and Alexandria. The Structures in MP have volumes around 10x that of Alexandria equivalents. Looking at the MP structures on the MP site they don’t really match with my expectation’s for the chemistry due to the large distances between the 1D chains (mp-1093584: CrReW2 (Orthorhombic, Immm, 71)) in contrast to


which at a surface level seem okay.