What is the procedure to contribute data to materials project database?

Sai_Siva_Kumar_Pinne · August 1, 2023, 5:12am

Dear community members,

I have been working on plasma-enhanced chemical vapor deposited amorphous materials (in the form of thin films) to develop a process-structure-property database as part of my graduate. Our lab will continue to contribute the more data involving different material classes.

How do I contribute the data to the materials project database? Is anyone willing to help us train to upload the data?

I appreciate your help and inputs.

Thanks,
Sai

tschaume · August 1, 2023, 5:43am

Hi @Sai_Siva_Kumar_Pinne,

Thanks for reaching out! We developed MPContribs exactly for this purpose. You can find some introduction about its concepts and usage at https://mpcontribs.org. The currently available datasets contributed by the community can be found at Materials Project - MPContribs Explorer or https://contribs.materialsproject.org. Contributed data is also automatically integrated in our materials details pages. Feel free to make yourself familiar a little and create your own project by filling out the form at MPContribs. The newly created project will provide some code snippets for you to get started with adding data to the project using the mpcontribs-client python library (see mpcontribs-client · PyPI). To add more than 500 contributions and make the project public, you will need approval by MP admins. Let us know when you’ve uploaded a few contributions. We’ll review them, give some feedback regarding data structure and approve if the data is in good shape.

HTH
Patrick

Sai_Siva_Kumar_Pinne · October 11, 2023, 4:06am

Hi @tschaume,

I appreciate your support and guidance. I could successfully create a project (a-SiCN:H thin films) and tried to push some data to the database. I ran in to a few issues with uploading data.

What kind of units does a column accept? I got an error with ‘Torr’ as unit for the pressure, and ‘sccm’ as an unit for flow rate.
Can column headers can have space in between the characters? For example ’ Substrate Temperature’?
Some of the cells in a column do not have data? How can I skip those? Should I use NaN?
Is there any specific forum for FAQs on data contribution and writing code?

Error for the units is as follows:

‘Traceback (most recent call last):
File “/usr/local/lib/python3.10/site-packages/flask_mongorest/views.py”, line 189, in _dispatch_request
return super(ResourceView, self).dispatch_request(*args, **kwargs)
File “/usr/local/lib/python3.10/site-packages/flask/views.py”, line 188, in dispatch_request
return current_app.ensure_sync(meth)(**kwargs)
File “/usr/local/lib/python3.10/site-packages/flask_mongorest/views.py”, line 387, in post
self.create_object(**kwargs)
File “/usr/local/lib/python3.10/site-packages/flask_mongorest/views.py”, line 416, in create_object
self._resource.save_object(obj, force_insert=True, **kwargs)
File “/usr/local/lib/python3.10/site-packages/flask_mongorest/resources.py”, line 1129, in save_object
obj.save(signal_kwargs=signal_kwargs, **kwargs).reload()
File “/usr/local/lib/python3.10/site-packages/mongoengine/document.py”, line 407, in save
signals.pre_save_post_validation.send(
File “/usr/local/lib/python3.10/site-packages/blinker/base.py”, line 300, in send
result = receiver(sender, **kwargs) # type: ignore[call-arg]
File “/app/mpcontribs/api/contributions/document.py”, line 296, in pre_save_post_validation
document.data = remap(document.data, visit=make_quantities, enter=enter)
File “/usr/local/lib/python3.10/site-packages/boltons/iterutils.py”, line 1183, in remap
visited_item = visit(path, key, value)
File “/app/mpcontribs/api/contributions/document.py”, line 254, in make_quantities
q = get_quantity(str_value)
File “/app/mpcontribs/api/contributions/document.py”, line 98, in get_quantity
return ureg.Measurement(*parts)
File “/usr/local/lib/python3.10/site-packages/pint/measurement.py”, line 56, in new
inst = super().new(cls, mag, units)
File “/usr/local/lib/python3.10/site-packages/pint/quantity.py”, line 264, in new
units = inst._REGISTRY.parse_units(units)._units
File “/usr/local/lib/python3.10/site-packages/pint/registry.py”, line 1202, in parse_units
units = self._parse_units(input_string, as_delta, case_sensitive)
File “/usr/local/lib/python3.10/site-packages/pint/registry.py”, line 1439, in _parse_units
return super()._parse_units(input_string, as_delta, case_sensitive)
File “/usr/local/lib/python3.10/site-packages/pint/registry.py”, line 1235, in _parse_units
cname = self.get_name(name, case_sensitive=case_sensitive)
File “/usr/local/lib/python3.10/site-packages/pint/registry.py”, line 722, in get_name
raise UndefinedUnitError(name_or_alias)
pint.errors.UndefinedUnitError: ‘Torr’ is not defined in the unit registry’

Appreciate your help.

Sincerely,
Sai

tschaume · October 13, 2023, 12:34am

Hi @Sai_Siva_Kumar_Pinne,

glad to hear that you were able to start a project and upload some data. You raise some good questions

The units for the columns are defined by pint. The unit for pressure would be torr. I just added sccm to the unit registry as a shortcut for cm³/min.
The (currently) only allowed special/non-alphanumeric character in the names of the columns (= nested fields/keys in data = column headers) is the | (pipe) character. It’s a good way of indicating when a value is measured/calculated under specific conditions (e.g. conductivity|300K). Underscores are disallowed to encourage contributors to organize and nest their data instead of submitting a long flat list of columns. I’d recommend to be as concise as possible with the column names and use the other field in the project info to provide a legend to the user. Use client.update_project({"other": {"<field/column>": "<description>", ...}} to update/set the other field in the project info.
There’s no need to set missing values for a subset of contributions to a specific value. Simply make sure to omit the columns/keys in the contribution dictionary (skip the key).
This forum is probably still the best place to look up / ask specific questions about MPContribs and your dataset. I’ll add more info to the MPContribs documentation over time.

HTH

Sai_Siva_Kumar_Pinne · October 15, 2023, 11:52pm

Hi Patrick (@tschaume),

Thanks for your detailed description. Could you please clarify a few other other questions regarding tables/attachments?

Are we restricted to single column for tables? Can we have subcolumns within tables? From a contributor perspective, it makes more sense to contribute various spectral data to different columns representing different techniques under tables. I am sure it might be difficult to maintain/process the data in the backend. Please clarify/correct me.
We have been trying to push several dataframes under tables component, and give specific title to each contribution with the following code snippet (attributes). All the contributed tables show up with ‘table’ title. What are we doing wrong?

for table in tables:
        new_data = pd.read_csv(current_path+"/"+table.strip())
        new_data.attrs["name"]=table.split(".")[0]
        new_data.attrs["title"]=table.split(".")[0]
        new_tables.append(new_data)
        print("table attributes are ",new_data.attrs)
        #break
    
 contrib = {
        "identifier": str(record["Films"]),
        "formula": record["MolecularFormula"],
        "data": data,
        "tables": new_tables,
        "attachments"   :  [current_path+"/"+record["Attachments"]]#,df2,df3,df4,df5,df6,df7,df8
    }

With the contribs.get_table and display functions, we were able to access data from tables. But the plotly fuction is treating all the columns in table as y axes and plotting against indices as x axis. Is it feasible for you to code in the backend to treat the first column as x-axis and the rest as y-axes? Or can we do something on our end to put a x, y descriptor for each column? Or is my idea unfeasible?

I accidently executed “is_approved”: false on MPContribs API, but I could not undo it with “is_approved”: true by myself. I keep getting the following error. Could you please help me out?

{
  "error": "401 Unauthorized: amorphousthinfilms is not approved yet."
}

As always, we appreciate your help and support!

Cheers,
Sai

tschaume · October 16, 2023, 7:47pm

Good questions

I think #1 and #3 are at its core the same question. In #3, you already describe correctly how MPContribs handles tables and their columns: the first (i.e. index column of the dataframe) is treated as x-axis, and all other columns are added as traces to the plotly graph using the dataframe index for x-axis values. Try explicitly setting the index for the dataframe:

df.set_index("BindingEnergy(eV)", inplace=True)

Don’t worry about #4. is_approved is false by default and only an admin (aka me) can change it. Once approved, you’ll be able to make your project public via client.make_public() and add more than 500 contributions.

#2 is a good catch! I just released a patch to the client. Please upgrade to mpcontribs-client==5.5.2. The following code example should work in a jupyter notebook now:

from mpcontribs.client import Client
import pandas as pd

client = Client(project="amorphousthinfilms")  # set MPCONTRIBS_API_KEY env var

df = pd.DataFrame(data={
    "Energy": [1, 2, 3], "XAS": [4, 5, 6], "XMCD": [3, 2, 1]
})
df.set_index("Energy", inplace=True)
df.index.name = "Energy [eV]"  # x-axis title
df.columns.name = "spectral type" # legend title
df.attrs["name"] = "Fe-XAS/XMCD"  # table name
df.attrs["title"] = "XAS and XMCD Spectra for Fe"  # graph title
df.attrs["labels"] = {"value": "a.u."}  # y-axis title

client.submit_contributions([{"identifier": "mp-3", "tables": [df]}])
contribs = client.query_contributions(fields=["tables"])
tables = contribs.get("data", [{}])[0].get("tables", [])
table = client.get_table(tables[0]["id"])
table.display()

Use the following snippet to preview the table before submission:

from mpcontribs.client import Table

table = Table(df)
table.attrs = df.attrs
table.display()

Sai_Siva_Kumar_Pinne · October 17, 2023, 6:19am

Thank you, Patrick.

#1 My first question is actually different from the third one. Can we include subcolumns with in the ‘Tables’ column that we see in the database (see the image below)? Such that we can contribute specific data tables to respective columns.

#2 Table titles issues is fixed. We could also assign ‘BindingEnergy (eV)’ column as an index in the code. When we looked at the graphs that MPContribs generated, it was displaying the x-axis label as ‘Binding Energy (eV)’ only for some graphs and not all. I am unsure why. It is not a big issue, but wanted to notify you.

Thanks,
Sai

tschaume · October 17, 2023, 9:25pm

Organizing tables into categories is an interesting idea. Unfortunately, it’s not supported right now and would take me some cycles to implement. For now, you’d have to add the category to the table names. One option would be to add FTIR/ as a folder-like prefix, e.g. set the table name to FTIR/A106-120-Q6. You can later use that prefix in a query to find the tables compiled under that category (FYI the dataframe attrs are queryable, too - see API docs):

client.tables.queryTables(name__startswith="FTIR/").result()

Thanks for pointing out the potential issue with the x-axis/index. I’ll take a closer look when I get the chance.

Sai_Siva_Kumar_Pinne · October 22, 2023, 8:04pm

Hi Patrick,

Thanks for your help, and we could figure out how to successfiully push our data to MPContribs. I just wanted to bring few issues to your attention.

#1 Sometimes, the appearance of data on contribs page is erratic. At times, it is converting units (torr to mtorr, nm/s to pm/s) by itself. And the labels do not show up for some of the graphs; it could be issues associated with fetching the data.

#2 We got an error while trying to push >10 tables in a contribution. Would you be able to increase the limit?
“”"
client.submit_contributions(contributions,ignore_dupes=True)
File “/opt/homebrew/lib/python3.11/site-packages/mpcontribs/client/init.py”, line 1982, in submit_contributions
raise MPContribsClientError(f"Too many {component} ({nelems} > {MAX_ELEMS})!“)
mpcontribs.client.MPContribsClientError: Too many tables (15 > 10)!
“””
#3 On the MPContribs API platform (PUT - update a project), I ran into an issue while adding information to “other” section. I could only append the data in “other” info, but not delete the existing information. Is there anyway to delete the exisitng informaiton and add a new one?

Appreciate your expertise and help.

Sincerely,
Sai

tschaume · October 23, 2023, 6:07pm

Wonderful! Thanks for reporting back @Sai_Siva_Kumar_Pinne !

#1 I’d have to take a closer look into the labels issue when I get the chance. As for the units, MPContribs will not do any conversion if you run client.init_columns() before submitting data. This will pre-define the order of the columns and their units. MPContribs will then ensure that the submitted units in each contribution are consistent with init_columns() and convert them to the requested unit only if needed. You might sometimes have to rerun init_columns() after submitting all contributions to ensure that the intended order is still there. init_columns() can be run at any point - however, it will throw an error if you’re trying to convert the default unit for a column after submission. That’s why pre-defining columns through init_columns() is a good practice (I might make that mandatory in the future).

#2 Yes, if there’s no way for you to reasonable merge tables to get under the limit, I can increase it. You might also consider adding less important columns as simple attachments (see Attachment.from_data()). What’s the maximum number of tables you need?

#3 It’s best to run updates on the project info through client.update_project(). Try running client.update_project({"other": {}}) to reset the other field.

HTH

tsmathis · February 23, 2024, 12:38am

Thread closed due to inactivity, please open a new thread to address related issues.