HTTPS Connection Error

I am running the code, with my api key:

with MPRester(api_key=“my api key”) as mpr:
docs1 = mpr.thermo.search(fields=[“nsites”, “composition”, “volume”, “symmetry”, “formula_pretty”, “material_id”, “last_updated”,
“uncorrected_energy_per_atom”, “energy_per_atom”, “formation_energy_per_atom”, “is_stable”, “deprecated”, “deprecation_reasons”])

I was running it a month ago with no problems but for the last two days I get the error:

MPRestError: HTTPSConnectionPool(host=‘api.materialsproject.org’, port=443): Max retries exceeded with url: /materials/thermo/?_limit=1000&_fields=nsites%2Ccomposition%2Cvolume%2Csymmetry%2Cformula_pretty%2Cmaterial_id%2Clast_updated%2Cuncorrected_energy_per_atom%2Cenergy_per_atom%2Cformation_energy_per_atom%2Cis_stable%2Cdeprecated%2Cdeprecation_reasons&_skip=182000 (Caused by ReadTimeoutError(“HTTPSConnectionPool(host=‘api.materialsproject.org’, port=443): Read timed out. (read timeout=20)”))

Thanks for letting us know. Our API started seeing significantly increased traffic due to inefficient usage by another user about 15 hours ago. The issue should be resolved now. Sorry for the inconvenience.

Hi Patrick,

Thanks for your reply. I am still getting the same error today.

Hi Cassie,

sorry for the troubles. We unfortunately can’t reproduce the issue. We’re able to get all 342588 documents in about 6 min without issue (@munrojm). For now, my only advice would be to make sure you’re running the latest mp-api client and keep trying. If the issue continues to persist, please let us know.

There are also some tips and tricks for large downloads here. HTH.

thanks,
Patrick

Hi Patrick,

Unfortunately, the issue is persisting. I normally pull these fields=[“nsites”, “composition”, “volume”, “symmetry”, “formula_pretty”, “material_id”, “last_updated”, “uncorrected_energy_per_atom”, “energy_per_atom”, “formation_energy_per_atom”, “is_stable”, “structure”, “theoretical”,“symmetry”]) from the mpr.summary.search a couple times of the year. I can still pull it right now with summary but back in July Jason said that the new r2SCAN data was only with the thermo endpoint and you were working on releasing the data to the summary endpoint with the next data release (to be fair that might have happened already and I missed it). So I am trying to pull most of these fields from the thermo endpoint. I can pull from the thermo endpoint but only for a few material ids. If I try to pull it all, I get the to about 41% and then get the read time out error:

MPRestError: HTTPSConnectionPool(host=‘api.materialsproject.org’, port=443): Max retries exceeded with url: /materials/thermo/?_limit=1000&_fields=nsites%2Ccomposition%2Cvolume%2Csymmetry%2Cformula_pretty%2Cmaterial_id%2Clast_updated%2Cuncorrected_energy_per_atom%2Cenergy_per_atom%2Cformation_energy_per_atom%2Cis_stable&_skip=140000 (Caused by ReadTimeoutError(“HTTPSConnectionPool(host=‘api.materialsproject.org’, port=443): Read timed out. (read timeout=20)”))

I tried a bunch of different things to get it to work but still have had no luck.

Best Regards,

Cassie

Thank you for reporting back! We took a closer look and noticed that an index in our database was missing after a recent data patch. We double-checked your query on our end (elapsed time ~3.5min) and are confident that the timeout issue is fixed now. Please let us know if that isn’t the case for you. Thanks!

Hi Patrick, I am back again with a few months delay. After your last message it was working fine until now. I am getting once again a time out error when using the Thermo endpoint. Here is the error:

HTTPSConnectionPool(host='api.materialsproject.org', port=443): Read timed out.

Here is my code again:

with MPRester(api_key="PLEASE-DONT-SHARE-YOUR-APIKEY!!!!!") as mpr:
    docs1 = mpr.thermo.search(fields=["nsites", "composition", "volume", "symmetry", "formula_pretty", "material_id", "last_updated",
    "uncorrected_energy_per_atom", "energy_per_atom", "formation_energy_per_atom", "is_stable", "energy_type","entry_types"])
    #list_of_available_fields = mpr.thermo.available_fields
print("docs1 is done")

I have three questions

  1. can you help me with the time out error
  2. I have to pull basically your whole data set a couple times a year, is there a better way to do it?
  3. I am pulling some data from the summary endpoint which is working fine and some data from the thermo endpoint because I want the R2SCAN data, is going to be added to the summary endpoint anytime soon?
    Thanks for your help. Hope you are having a good start to your New Year.

Hi Cassie,

thanks for reaching out again.

  1. The timeout error should only be temporary. It can happen during the midnight hours (pacific time) when traffic from Asia is at its peak. We’ve also been fighting new scrapers, botnets, and abusive traffic to our website over the last couple of weeks :frowning: We are working on making these endpoints more resilient by integrating the mp-api client with our AWS OpenData repositories (see #2).
  2. If you’re only interested in downloading the entire thermo dataset (or as backup for any MP data retrieval), you can do so directly through our OpenData repos. See our docs and bucket browser for more info. The code snippet below is likely the fastest way to download all the thermo data for a specific database version. In the future, the mp-api client will hopefully make this seamless for our users.
  3. Yes, I’d expect the summary endpoint to almost always work fine since it is our most performant and used endpoint. I’m tagging @munrojm to comment on the R2SCAN data in the summary endpoint.
aws s3 cp --no-sign-request --recursive \
    s3://materialsproject-build.s3.amazonaws.com/collections/2023-11-01/thermo/ \
    mp-thermo/

HTH
Patrick