My NOMAD upload is stuck - it is processing data for 4 days already

Hello,

I tried to upload data into NOMAD, I have divided data into smaller portions (each <=32 GB) and I have zipped each one of them.

I have tried to upload first one, however even after 4 days I get message: Processing …, 228/325 entries processed.
No progress even after 3 days.

Did I make mistake because I tried to upload into this entry other data, before the first upload has finished processing?

I tried to upload some new zip files into separate entries, but files don’t get processed. I just get message "uploaded successfully, but no processing is continued.

This is the link for upload which is stuck: https://nomad-lab.eu/prod/v1/gui/user/uploads/upload/id/KmkjkIrrQ4abpewUzQXieA .

Link for upload which does not get processed further: https://nomad-lab.eu/prod/v1/gui/user/uploads/upload/id/KN7aBQTRSJufy8sq3tt3TQ

I will process the upload KmkjkIrrQ4abpewUzQXieA and monitor more closely. I let you know when I can tell you more.

These gromacs files need quite some time to process. The default processing is not well suited for this and we should treat your files a bit differently and allocate more resources.

  • How many of these uploads, i.e. 32GB zipfiles, do you have in total?
  • Please stop uploading for now.
  • I will setup something and I will process the existing stuck uploads (EibfW8zERry-ezZIR0EQzA, KmkjkIrrQ4abpewUzQXieA)
  • Are EibfW8zERry-ezZIR0EQzA and KmkjkIrrQ4abpewUzQXieA the same?

The other upload KN7aBQTRSJufy8sq3tt3TQ does only contain a single Pulling.tar.xz file that NOMAD does not recognize.

Hi @mscheidgen thank you for help!

First of all, thank you for letting me know that NOMAD does not recognize .tar.xz files , so I have deleted [rJ55m9SvTaqtE1m2GsGLDw](https://nomad-lab.eu/prod/v1/gui/user/uploads/upload/id/rJ55m9SvTaqtE1m2GsGLDw and [KN7aBQTRSJufy8sq3tt3TQ](https://nomad-lab.eu/prod/v1/gui/user/uploads/upload/id/KN7aBQTRSJufy8sq3tt3TQ.

In the meantime I will prepare zip files (I will not upload them again until you tell me).

EibfW8zERry-ezZIR0EQzA and KmkjkIrrQ4abpewUzQXieA are same, only difference that *QZA got uploaded today again because *QXiea did not finish yet.
It seems to me that today’s upload is progressing faster than the one I uploaded on Thursday.

Only difference is that I have loaded into Thursday one all other (.tar.xz files) while processing did not finish yet.

Let me answer other questions of yours: I think that I have 12 zip files which are around 30GB large, and 3 uploads which are between 5-15 GB large.

EibfW8zERry-ezZIR0EQzA and KmkjkIrrQ4abpewUzQXieA are same data, it is first of 12 *zip files with roughly 32 GB data each.

And after that I have 3 more ZIP filles (5-15 GB).

We now have a dedicated NOMAD installation. This is the url: https://nomad-lab.eu/prod/v1/util/gui/
It is the same as the official NOMAD that you used. The only different is the util in the URL. This installation uses more resources for the processing and is configured a bit differently, more timeouts, etc. It only has a higher limit for unpublished uploads. I suggest we upload and processes everything before publishing anything.

The “util” installation is currently processing KmkjkIrrQ4abpewUzQXieA again. 100 of your entries roughly take 1h to process. It should be done in roughly 2h from now.

After KmkjkIrrQ4abpewUzQXieA has finished, you can continue to upload. Please do one at a time. Wait until it has completed processing before doing the next. When you continue to upload files, please only upload to “util” NOMAD. You can use the GUI, curl, API, it does not matter; just make sure the util is in the respective URL.

I will try to have an eye on it. But if some upload takes considerably more than 3h let me know. You have to have a bit of patience, but we will get this done eventually.

@mscheidgen thanx for help!

So basically I log in to util version of NOMAD, and when processing has finished, I just click on Button “Drop files here or …” and upload next batch?

So basically there is no need to have several single entry uploads and later merge them into one entry?

@mscheidgen I see one failure due to fail to processing of /#md.log.1# , however this file should not be there because my python script supposed to delete all backup GROMACS files '##'.
I don’t know why that file is still there.

So if 324 entries got processed successfully, and only one failed it should not be problem top click “Publish” on this data? Or I would need to find that “wrong file” and delete it from entry manually?

You could also publish with one failed entry in there. However, I suggest we ignore the one failed entry for now and you just continue uploading one by one. Lets try to get the bulk of data in there. We can publish everything (and/or fix these smaller problems) at the end.

@mscheidgen thanx for answer! I don’t think it is problem to keep one failed entry (especially in this case because I know that this file should not be there).

Right now I need to zip all other folders into .zip format (instead of tar.xz which NOMAD could not recognize), it will take few hours so basically I can continue uploading tomorrow morning, and then I can update you does it take 3-4 hours per processing.

I guess this current processing will finish by tomorrow (right now it is 231/325 entries processed, so it looks good).

Unfortunately, the upload KmkjkIrrQ4abpewUzQXieA got stuck again at 240 entries. I made the config even more conservative and I am trying it again.

@mscheidgen yes I saw that upload started to process from 0 and I was wondering what was the issue.

Do you think that issue would be to create even smaller zip files?

In principle smaller pieces are easier to manage, but let me work on it a bit more before we cut it further down.

@mscheidgen thanx.

Is this correct command if I want to upload new data by terminal command:

curl -X POST 'http://nomad-lab.eu/prod/v1/api/v1/uploads?token=KmkjkIrrQ4abpewUzQXieA' -T Pulling.zip

So after token= … I put ID entry of upload?

No, you cannot (and should not) determine the id of new uploads. Also you are not using the “util” installation here.

You have to grab <your-personal-token> from the command listed on the upload page. This is linked to your account, not to any uploads.

curl -X POST 'http://nomad-lab.eu/prod/v1/util/api/v1/uploads?token=<your-personal-token>' -T Pulling.zip

It is still a bumpy ride, the processing of K... is causing a lot of restarts on our processing workers. Something in the processing of these Gromacs entries seem to tear down the whole process. Anyhow, it seems the current settings allow the system to recover enough to process everything.

I suggest you continue with. Please make sure to use the URL that points to the util installation:

curl -X POST 'http://nomad-lab.eu/prod/v1/util/api/v1/uploads?token=<your-personal-token>' -T Pulling.zip

I will keep an eye on this.

Can I delete EibfW8zERry-ezZIR0EQzA?

I imagine the whole one by one process is very painful. Please just upload without waiting for the other uploads to process. The processing is queued anyways. It is more important to get all your files uploaded. I can still fix the processing of individual uploads, once everything has been uploaded. The “util” installation has a higher threshold for unpublished uploads; you can upload everything before publishing.

Let me know, when you uploaded everything. I will then fix any processing issues. And then you can publish everything. Btw. do you have a deadline for publishing? I imagine you might need a DOI for something?

@mscheidgen thanx for help, yes you can delete this other upload on “slower NOMAD”.

I started uploading next file, when it is uploaded I will upload next one and so on.

Right now there is no deadline for publishing, but it would be good if I could finish upload this week because this is my last week at this institute :).
Of course I can connect to upload remotely or come again sometimes but it is not convenient.

I think we will get this done this week. You just keep uploading your zip-files and one way or another we will find a solution to publish this week.

@mscheidgen thanx! I have uploaded all zip files into single entry.

It would be good if they are processed this week, they should be published in 1 2 months.