Pulling Large Amounts of Data

Hey everyone,

What size of a data query would your team need a heads-up for? I’d like to pull all structures and store them locally – and I just wanted make sure this was safe to do.

It looks like MPRester already implements chunking, so can I just go ahead and query with criteria {task_id: {$exists: true}} to grab all structures? This is will be after I do some testing on a smaller query too (<200 structures).


Yep, this is fine!

In general most API use is ok and you will be rate-limited automatically and temporarily by the firewall if it’s an issue. Generally speaking, fewer large API queries are preferable over many small queries (e.g. don’t put your API query inside a for loop).

For our new API, where we have larger data like charge densities available, we will have to be more careful, and likely have users request access to information like charge densities on a per-user basis, but we’re still discussing our options in this regard internally.

1 Like

Awesome, thanks! And yep, I’m making sure to minimize the number of database hits. :grin:

I’m not at the point of pulling the large VASP output files, so we’re good there. I am curious what you end up deciding here though – whether it’s per-user permissions or something like a universal time-out/query-size limit.

Also thank you for the quick response!