Asking the pagination

Dear All,
I’ve recently made a query as:
optimade-get https://nomad-lab.eu/prod/rae/optimade/ --use-async --filter ‘nelements = 2 AND elements HAS ALL “Sc”,“H”’ --max-results-per-provider 1000 --http-timeout 3000 --output-file bi-H-Sc.json
╭─────────────────────────────────────────────────────────────────────────────────╮
│ Performing query structures/?filter=nelements = 2 AND elements HAS ALL “Sc”,“H” │
╰─────────────────────────────────────────────────────────────────────────────────╯
Error: Provider ‘https://nomad-lab.eu/prod/rae/optimade/’ returned: [‘RuntimeError: 500 -
https://nomad-lab.eu/prod/rae/optimade/v1/structures?filter=nelements+%3D+2+AND+elements+H
AS+ALL+%22Sc%22%2C%22H%22&page_offset=460: ExtraData: unpack(b) received extra data.’]
✓ nomad-lab.eu/prod/rae/optimade/v1/structures ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 460/460 0:00:32

I have encountered 904 structures in the results. However, due to the default page_offset being set to 460, I am unable to retrieve all the data.
Infact, as you can see the optimade-get client did a paging until it runs out of results. The error message is actually an error on the server side: optimade-get tried to pull the page with offset 460 and the server returned "500 ExtraData: unpack received extra data" , probably due to a bad entry somewhere in NOMAD.

You can see the same results in the browser with page_offset=460 at https://nomad-lab.eu/prod/rae/optimade/v1/structures?filter=nelements+%3D+2+AND+elements+HAS+ALL+"Sc"%2C"H"&page_offset=460.

or even with page_limit=1 at https://nomad-lab.eu/prod/rae/optimade/v1/structures?filter=nelements+%3D+2+AND+elements+HAS+ALL+"Sc"%2C"H"&page_offset=468&page_limit=1

Your suggestions on how to address this through paging would be greatly appreciated.

Thank you for your assistance.
Dr. Tuoc Vu
Hanoi Univ. of Science and Technology

HI @vnt,

Indeed this looks like an issue with certain entries and our OPTIMADE implementation.

Until the problem is fixed, the best you can do is to perform pagination through our API in relatively small batches, and if an HTTP error is caught, retry each entry individually. I did not see a mechanism for recovering from errors in the optimade-get tool, but you could achieve the same result by using a Python script. It would look something like this:

import requests
import json

base_url = 'https://nomad-lab.eu/prod/rae/optimade'
batch_size = 50
output_file = 'bi-H-Sc.json'

start = 0
i_batch = 0
data = []
url = f'{base_url}/structures?filter=nelements+%3D+2+AND+elements+HAS+ALL+"Sc"%2C"H"'
limit = float('Inf')

while True:
    offset = start + i_batch * batch_size
    if offset > limit:
        break
    url = f'{url}&page_offset={offset}&page_limit={batch_size}'
    response = requests.get(url)
    i_batch += 1
    msg = f'STATUS: {response.status_code}, BATCH: {offset}-{offset + batch_size}'
    if response.status_code >= 500:
        print(msg)
        for i in range(batch_size):
            inner_offset = offset + i
            inner_url = f'{url}&page_offset={inner_offset}&page_limit=1'
            inner_response = requests.get(inner_url)
            if inner_response.status_code >= 500:
                print(f'FAILURE ON {inner_offset}')
            else:
                data.extend(inner_response.json()['data'])
    else:
        response_json = response.json()
        if limit == float('Inf'):
            limit = response_json['meta']['data_returned']
            print(f'NUMBER OF ENTRIES: {limit}')
        print(msg)
        data.extend(response_json['data'])

with open(output_file, 'w') as fout:
    json.dump(data, fout, indent=2)

Great thanks