Connection to COD optimade API is very slow

Title is quite self-explanatory. The connection either works slowly or times out in about 3-4 minutes. This is something that does not happen with materials-project. Is it a known issue? Is it a server issue or maybe COD runs a more exhaustive validation?

That is strange. I just tried to do a simple query and I received a response within a couple of seconds.

Which query did you use?
The query I used was:
https://www.crystallography.net/cod/optimade/v1/structures?filter=(elements HAS ALL “N”) AND (nelements=2)&response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites: ‘species_at_sites’

It is a rather long query:

https://www.crystallography.net/cod/optimade/v1/structures?filter=(elements HAS ALL “Co”, “Li”, “O”) AND NOT (elements HAS ANY “Ac”, “Ag”, “Al”, “Am”, “Ar”, “As”, “At”, “Au”, “B”, “Ba”, “Be”, “Bh”, “Bi”, “Bk”, “Br”, “C”, “Ca”, “Cd”, “Ce”, “Cf”, “Cl”, “Cm”, “Cr”, “Cs”, “Cu”, “Db”, “Dy”, “Er”, “Es”, “Eu”, “F”, “Fe”, “Fm”, “Fr”, “Ga”, “Gd”, “Ge”, “H”, “He”, “Hf”, “Hg”, “Ho”, “Hs”, “I”, “In”, “Ir”, “K”, “Kr”, “La”, “Lr”, “Lu”, “Md”, “Mg”, “Mn”, “Mo”, “Mt”, “N”, “Na”, “Nb”, “Nd”, “Ne”, “Ni”, “No”, “Np”, “Os”, “P”, “Pa”, “Pb”, “Pd”, “Pm”, “Po”, “Pr”, “Pt”, “Pu”, “Ra”, “Rb”, “Re”, “Rf”, “Rh”, “Rn”, “Ru”, “S”, “Sb”, “Sc”, “Se”, “Sg”, “Si”, “Sm”, “Sn”, “Sr”, “Ta”, “Tb”, “Tc”, “Te”, “Th”, “Ti”, “Tl”, “Tm”, “U”, “V”, “W”, “Xe”, “Y”, “Yb”, “Zn”, “Zr”)&response_fields=species,lattice_vectors,cartesian_site_positions,species_at_sites

I always had the feeling that this is bruteforcing a bit. However the response times of this query and this one:

https://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20ALL%20"Co",%20"Li",%20"O")&response_fields=species,lattice_vectors,cartesian_site_positions,species_at_sites

Is virtually the same and very long.

I just tested and not asking for response_fields makes it much faster, in any of the queries. Regarding the query you copied I get a 400 Bad request error.

Couple of things:

It is a rather long query:

https://www.crystallography.net/cod/optimade/v1/structures?filter=(elements HAS ALL “Co”, “Li”, “O”) AND NOT (elements HAS ANY “Ac”, “Ag”, “Al”, “Am”, “Ar”, “As”, “At”, “Au”, “B”, “Ba”, “Be”, “Bh”, “Bi”, “Bk”, “Br”, “C”, “Ca”, “Cd”, “Ce”, “Cf”, “Cl”, “Cm”, “Cr”, “Cs”, “Cu”, “Db”, “Dy”, “Er”, “Es”, “Eu”, “F”, “Fe”, “Fm”, “Fr”, “Ga”, “Gd”, “Ge”, “H”, “He”, “Hf”, “Hg”, “Ho”, “Hs”, “I”, “In”, “Ir”, “K”, “Kr”, “La”, “Lr”, “Lu”, “Md”, “Mg”, “Mn”, “Mo”, “Mt”, “N”, “Na”, “Nb”, “Nd”, “Ne”, “Ni”, “No”, “Np”, “Os”, “P”, “Pa”, “Pb”, “Pd”, “Pm”, “Po”, “Pr”, “Pt”, “Pu”, “Ra”, “Rb”, “Re”, “Rf”, “Rh”, “Rn”, “Ru”, “S”, “Sb”, “Sc”, “Se”, “Sg”, “Si”, “Sm”, “Sn”, “Sr”, “Ta”, “Tb”, “Tc”, “Te”, “Th”, “Ti”, “Tl”, “Tm”, “U”, “V”, “W”, “Xe”, “Y”, “Yb”, “Zn”, “Zr”)&response_fields=species,lattice_vectors,cartesian_site_positions,species_at_sites

If you want Li-Co-O structures, you could try

?filter=elements HAS ONLY "Li", "Co", "O" or ?filter=elements HAS ALL "Li", "Co", "O" and nelements=3 to achieve the same thing.

I just tested and not asking for response_fields makes it much faster, in any of the queries. Regarding the query you copied I get a 400 Bad request error.

I may be wrong, but I think the problem here is that COD does not store the “P1” simulation cell, but instead stores Wyckoff sites and symmetry operations, so when you request cartesian_site_positions it has to do some fairly heavy computations to provide them. I believe they are working on caching these for OPTIMADE, but it may explain why the queries take so long at the moment. The same thing applies to species and species_at_sites which needs the unfolded sites to populate the list.

1 Like

Regarding the query you copied I get a 400 Bad request error.

Finally, this looks like a problem with the forums. If you don’t use a code block for the query, double-quotes " get turned into fancy ones “” that break URLs.

I do not think the cartesian_site_positions are the problem. If I do a query without a filter but with response_fields https://www.crystallography.net/cod/optimade/v1/structures?response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites
the site does respond quickly. So I do not think that is the issue.

Data bases are usually indexed to speed up performance. So queries on indexed properties are fast.
If you search for entries with for example nelements=3 you notice it is much faster than when you look for a set of elements, like in your query. In that case, the database has to check many of the entries to see it they have the right set of elements, which takes much longer.

1 Like

I do not think the cartesian_site_positions are the problem. If I do a query without a filter but with response_fields https://www.crystallography.net/cod/optimade/v1/structures?response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites
the site does respond quickly. So I do not think that is the issue.

Interesting, looks like you are right!

Thanks for the inputs. But regarding @JPBergsma 's response, I still find that there is little to no time difference between both of those queries. Both of them being really slow still.

https://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20ALL%20"Co",%20"Li",%20"O")&response_fields=species,lattice_vectors,cartesian_site_positions,species_at_sites

http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20ALL%20"Co",%20"Li",%20"O")%20AND%20(nelements=3)&response_fields=species,lattice_vectors,cartesian_site_positions,species_at_sites

Lastly regarding the comment of @ml-evs about the query to obtain Li-Co-O structures (?filter=elements HAS ONLY "Li", "Co", "O" or ?filter=elements HAS ALL "Li", "Co", "O" and nelements=3. My intention is to find all phases in a composition range, but also including the edges of the phase diagrams. Meaning that I am inputting which elements appear for sure in the phase, and which may or may not appear. So if I mark Li and O as ‘sure’ elements, and Co as ‘may appear’ element, then it should return all phases containing Li and O (Li2O, Li2O2…) and also phases that include Co (LiCoO2…).

Ah, I’m with you. If you want all of the Li-Co-O phase space then HAS ONLY should do what you want (without the nelements=3 bit). This filter is optional, and the suggestion in the spec is to do exactly what you did with the large negation list if a database does not support it (I see that the COD does not support it). It might just be simpler to split it into multiple queries for the pairs of edges (and single elements), but I appreciate that is not the most elegant solution…

I would expect the “long” query to be slow anyway, but as you and @JPBergsma have pointed out, there seems to be something else going on here too.

I was using the nelements=3 to show that some queries are fast, because the database has indexed them. i.e., It has already made a list of all the entries that have three elements. Unfortunately, the database has not done this for combinations of three elements. In that case, it probably uses one of the elements to make a pre-selection. And subsequently it has to look at each entry to see if the other elements are present (or not present as in your original query). The database also returns the total number of entries that match. So it can’t stop after having found the first ten. Therefore, it takes much longer to perform such a query.

Ok, yes indeed including more elements makes the query slower. I have made some extra tests (sorry for long post)

Queries with response_fields and nelements

https://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li")%20AND%20(nelements=3)&response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites (3,8 sec)
http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li",%20"Co")%20AND%20(nelements=3)&response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites (1min)
http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li",%20"Co",%20"O")%20AND%20(nelements=3)&response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites (1min5sec)

Queries with only response_fields

http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li")&response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites (7.2 sec)
http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li","Co")&response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites (1min14sec)
http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li","Co","O")&response_fields=lattice_vectors,cartesian_site_positions,species,species_at_sites (1min11sec)

Queries without any of the former

http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li") (1sec)
http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li","Co") (1sec)
http://www.crystallography.net/cod/optimade/v1/structures?filter=(elements%20HAS%20%20ALL%20"Li","Co",%20"O") (1,1sec)

Asking for response_fields definitely slows the response. And it increases a lot from 1 two 2 elements in the search. However, any of the abovementioned queries in Materials Project takes less than 1sec.

Perhaps you can contact the cod database directly. [email protected]
They know more about how their backend works than I do.