Problem with uploading data to MPContribs

Hello,

I am encountering an error with uploading data to MP-contribs:

I have the following script:

from mpcontribs.client import Client
client = Client(
    host="contribs-api.materialsproject.org",
    apikey=API_KEY,
    project="bader_charges"
)

client.init_columns({"task_id": None, "vacuum_charge": "e"})

contributions = [{
    "identifier": row['material_id'],
    "data": {
        "task_id":row['task_id'],
        "vacuum_charge": row['vacuum_charge'],
    },
    "tables": [pd.DataFrame({"charge":row['charge'],"atomic_volume":row['atomic_volume']})],
    "structures": [Structure.from_dict(row['structure'])],} for n, row in df[:1].iterrows()]

My contributions looks like the following, for example:

[{'identifier': 'mp-864733',
  'data': {'task_id': 'mp-2901424', 'vacuum_charge': 0.0},
  'tables': [      charge  atomic_volume
   0  10.947680      21.496675
   1  10.947656      21.499067
   2   7.349433      56.440158
   3   7.351823      56.433166
   4   7.351371      56.423598
   5   7.349040      56.433350
   6   7.351273      56.404646
   7   7.351724      56.414214],
  'structures': [Structure Summary
   Lattice
       abc : 10.794529822305464 10.794529822305464 3.781098
    angles : 90.0 90.0 120.00237694128245
    volume : 381.5448745437776
         A : 5.397071 -9.348449 0.0
         B : 5.397071 9.348449 -0.0
         C : -0.0 -0.0 3.781098
       pbc : True True True
   PeriodicSite: Mo (5.397, -3.117, 0.9453) [0.6667, 0.3333, 0.25]
   PeriodicSite: Mo (5.397, 3.117, 2.836) [0.3333, 0.6667, 0.75]
   PeriodicSite: I (5.397, -5.247, 2.836) [0.7806, 0.2194, 0.75]
   PeriodicSite: I (3.553, -2.052, 2.836) [0.4389, 0.2194, 0.75]
   PeriodicSite: I (7.241, -2.052, 2.836) [0.7806, 0.5611, 0.75]
   PeriodicSite: I (5.397, 5.247, 0.9453) [0.2194, 0.7806, 0.25]
   PeriodicSite: I (7.242, 2.052, 0.9453) [0.5612, 0.7806, 0.25]
   PeriodicSite: I (3.552, 2.052, 0.9453) [0.2194, 0.4388, 0.25]]}]

I get the following error:

0/1 processed -> retrying ... 0/1 processed -> retrying ... 0/1 processed -> retrying ...

bader_charges: Tried 3 times - abort.

It took 0.0min to submit 0/1 contributions.

With the following stacktrace:
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/site-packages/flask_mongorest/views.py", line 189, in _dispatch_request
    return super(ResourceView, self).dispatch_request(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flask/views.py", line 188, in dispatch_request
    return current_app.ensure_sync(meth)(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flask_mongorest/views.py", line 387, in post
    self.create_object(**kwargs)
  File "/usr/local/lib/python3.11/site-packages/flask_mongorest/views.py", line 413, in create_object
    if not self.has_add_permission(request, obj):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/mpcontribs/api/contributions/views.py", line 168, in has_add_permission
    if obj.project.unique_identifiers and Contributions.objects(query).count():
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mongoengine/queryset/base.py", line 123, in __call__
    raise InvalidQueryError(msg)
mongoengine.errors.InvalidQueryError: Not a query object: {'project': 'bader_charges', 'identifier': 'mp-864733'}. Did you intend to use key=value?
...

Could you please advise as to what I am doing wrong?

Thanks for reporting this @Martin_Siron1. We just discovered that bug a few days ago and are rolling out a deployment today that should fix the issue. I’ll update here when it’s out. Thanks!

We’ve just deployed a fix for this. Please let us know the issue persists for you.

Hello,

Can confirm I was able to upload information, however the response message seems like I may have not been able to:

Prepare: 100%

499/499 [00:00<00:00, 966.85it/s]

Submit: 0%

1/499 [01:01<8:32:24, 61.74s/it]

0/499 processed -> retrying ...

Submit: 0%

1/499 [00:17<2:25:43, 17.56s/it]

0/499 processed -> retrying ...

Submit: 0%

1/499 [00:17<2:24:09, 17.37s/it]

0/499 processed -> retrying ...

Submit: 0%



1/499 [00:17<2:24:17, 17.39s/it]

bader_charges: Tried 3 times - abort.

It took 2.0min to submit 0/499 contributions.

Reading this to me implies 0 contributions were uploaded, however looking at the online version it seemed 297 out of 500 were uploaded. Not sure what happened with the other 203?

Granted, the error messages are not very illuminating :slight_smile: But it looks like sth might be off with the format of some of the contributions. If you can narrow it down to a specific example, I should be able to provide guidance or identify the bug.

You also separately asked the following:

Some items under formula show up as an mp-ID even though clicking on them does lead to a material entry.

The formula column is filled automatically from the mp-id provided in the identifier column. It can happen that the internal config file that maps mp-id’s to formula is out of date. In general or for these cases, you can explicitly set the formula field in your contributions to make sure that the appropriate formula shows up.

And then task-IDs are automatically linked as material mp-ID even though they are better suited to link to a calculation rather on that material_ID?

That’s a fair point but there’s no good way of distinguishing whether a mp-(/d+) string refers to a task or a material (other than pre-defined column names). For now, the interface simply links any string looking like an mp-id to the according materials details page in the hope that a user can navigate to the according task from there.

Finally the table structure doesn’t make sense for this, but I wasn’t able to upload the charges, etc as a list?

MPContribs doesn’t accept lists in its queryable data component for indexing and performance reasons. There’s guidance in the docs for how to deal with lists (i.e. label the elements and convert to dict).

HTH

Hi @tschaume,

Thank you!

Re: formula, will do!!

That’s a fair point but there’s no good way of distinguishing whether a mp-(/d+) string refers to a task or a material (other than pre-defined column names).

Is it best practice to not post an association to a task id? Or is there a URL datatype, and can I put the full URL with the task_id linking to the calculation page?

MPContribs doesn’t accept lists in its queryable data component for indexing and performance reasons. There’s guidance in the docs for how to deal with lists (i.e. label the elements and convert to dict).

For this, it would mean essentially lists with the same lengths. What to do when lists vary in length because they are associated to site properties? The table still feels like the best alternative, but the plot associated with the table doesn’t make sense for site specific properties. Or should I fully decorate the structure with all site properties?

Thank you!!

Is it best practice to not post an association to a task id? Can I put the full URL with the task_id linking to the calculation page?

Posting a link to a task is not discouraged. Putting the full URL to the task detail page should work. The interface tries to parse any URL such that it displays nicely.

What to do when lists vary in length because they are associated to site properties? Or should I fully decorate the structure with all site properties?

If site properties need to be available for filtering (instead of just downloading), variability in length is not an issue and will only result in empty cells for contributions containing shorter lists. This is a valid approach if the number of sites doesn’t explode and has a reasonable maximum. The site index (or s1, s2, …, sN) could serve as dict key in this case. It sounds like decorating the structure with the site properties is the better approach here, though. I might have to double check that the MPContribs API doesn’t strip site properties from pymatgen structures. Feel free to try both approaches and lmk how it goes.

Hi Patrick,

I cannot do the method here:

 The site index (or s1, s2, …, sN) could serve as dict key in this case.

Without omitting some materials as the max amount of columns is 160 for MPContribs but the max number of sites for any given structure is 264 in my database. For now to try both methods I limited it to structures with just max 20 sites.

For the data method I cannot get the site data to upload:

columns = {"taskid": None, "vacuumcharge": None, }
columns.update({"charge.s{}".format(x):None for x in range(20)})
columns.update({"atomicvolume.s{}".format(x):None for x in range(20)})

client.init_columns(columns)

With a list of contributions like so:

{'identifier': 'mp-1184580',
 'data': {'taskid': 'https://next-gen.materialsproject.org/materials/mp-1184580/tasks/mp-1952515',
  'vacuumcharge': 0.0},
 'charge': {'s0': 8.338241,
  's1': 11.747498,
  's2': 15.948809,
  's3': 15.965453,
  's4': None,
  's5': None,
  's6': None,
  's7': None,
  's8': None,
  's9': None,
  's10': None,
  's11': None,
  's12': None,
  's13': None,
  's14': None,
  's15': None,
  's16': None,
  's17': None,
  's18': None,
  's19': None},
 'atomicvolume': {'s0': 12.218051,
  's1': 11.832801,
  's2': 18.043842,
  's3': 18.115342,
  's4': None,
  's5': None,
  's6': None,
  's7': None,
  's8': None,
  's9': None,
  's10': None,
  's11': None,
  's12': None,
  's13': None,
  's14': None,
  's15': None,
  's16': None,
  's17': None,
  's18': None,
  's19': None},
 'structures': [Structure Summary
  Lattice
      abc : 4.399411917362592 4.399411917362592 4.399411917362592
   angles : 59.99999999999999 59.99999999999999 59.99999999999999
   volume : 60.21003545068222
        A : 0.0 3.110854 3.110854
        B : 3.110854 -0.0 3.110854
        C : 3.110854 3.110854 -0.0
      pbc : True True True
  PeriodicSite: Hf (3.111, 3.111, 3.111) [0.5, 0.5, 0.5]
  PeriodicSite: Zn (0.0, 0.0, 0.0) [0.0, 0.0, 0.0]
  PeriodicSite: Rh (1.555, 1.555, 1.555) [0.25, 0.25, 0.25]
  PeriodicSite: Rh (4.666, 4.666, 4.666) [0.75, 0.75, 0.75]]}

I can confirm however that the site specific properties are retrievable using client.get_structure() method. It just is not obvious that these properties are indeed there. Perhaps it is better to have it as a string?!

Thanks for testing! Do you need e.g. charge.s3 to be queryable, i.e. data__charge__s3__lt=3? If not, we should include these properties with the structure. I’m a little surprised that they go through though cause I’m explicitly stripping them in the client. If the structures don’t go through as you like, you could also add them as attachments.