I tried to upload data into NOMAD, I have divided data into smaller portions (each <=32 GB) and I have zipped each one of them.
I have tried to upload first one, however even after 4 days I get message: Processing …, 228/325 entries processed.
No progress even after 3 days.
Did I make mistake because I tried to upload into this entry other data, before the first upload has finished processing?
I tried to upload some new zip files into separate entries, but files don’t get processed. I just get message "uploaded successfully, but no processing is continued.
These gromacs files need quite some time to process. The default processing is not well suited for this and we should treat your files a bit differently and allocate more resources.
How many of these uploads, i.e. 32GB zipfiles, do you have in total?
Please stop uploading for now.
I will setup something and I will process the existing stuck uploads (EibfW8zERry-ezZIR0EQzA, KmkjkIrrQ4abpewUzQXieA)
Are EibfW8zERry-ezZIR0EQzA and KmkjkIrrQ4abpewUzQXieA the same?
The other upload KN7aBQTRSJufy8sq3tt3TQ does only contain a single Pulling.tar.xz file that NOMAD does not recognize.
In the meantime I will prepare zip files (I will not upload them again until you tell me).
EibfW8zERry-ezZIR0EQzA and KmkjkIrrQ4abpewUzQXieA are same, only difference that *QZA got uploaded today again because *QXiea did not finish yet.
It seems to me that today’s upload is progressing faster than the one I uploaded on Thursday.
Only difference is that I have loaded into Thursday one all other (.tar.xz files) while processing did not finish yet.
Let me answer other questions of yours: I think that I have 12 zip files which are around 30GB large, and 3 uploads which are between 5-15 GB large.
We now have a dedicated NOMAD installation. This is the url: https://nomad-lab.eu/prod/v1/util/gui/
It is the same as the official NOMAD that you used. The only different is the util in the URL. This installation uses more resources for the processing and is configured a bit differently, more timeouts, etc. It only has a higher limit for unpublished uploads. I suggest we upload and processes everything before publishing anything.
The “util” installation is currently processing KmkjkIrrQ4abpewUzQXieA again. 100 of your entries roughly take 1h to process. It should be done in roughly 2h from now.
After KmkjkIrrQ4abpewUzQXieA has finished, you can continue to upload. Please do one at a time. Wait until it has completed processing before doing the next. When you continue to upload files, please only upload to “util” NOMAD. You can use the GUI, curl, API, it does not matter; just make sure the util is in the respective URL.
I will try to have an eye on it. But if some upload takes considerably more than 3h let me know. You have to have a bit of patience, but we will get this done eventually.
@mscheidgen I see one failure due to fail to processing of /#md.log.1# , however this file should not be there because my python script supposed to delete all backup GROMACS files '##'.
I don’t know why that file is still there.
So if 324 entries got processed successfully, and only one failed it should not be problem top click “Publish” on this data? Or I would need to find that “wrong file” and delete it from entry manually?
You could also publish with one failed entry in there. However, I suggest we ignore the one failed entry for now and you just continue uploading one by one. Lets try to get the bulk of data in there. We can publish everything (and/or fix these smaller problems) at the end.
@mscheidgen thanx for answer! I don’t think it is problem to keep one failed entry (especially in this case because I know that this file should not be there).
Right now I need to zip all other folders into .zip format (instead of tar.xz which NOMAD could not recognize), it will take few hours so basically I can continue uploading tomorrow morning, and then I can update you does it take 3-4 hours per processing.
I guess this current processing will finish by tomorrow (right now it is 231/325 entries processed, so it looks good).
It is still a bumpy ride, the processing of K... is causing a lot of restarts on our processing workers. Something in the processing of these Gromacs entries seem to tear down the whole process. Anyhow, it seems the current settings allow the system to recover enough to process everything.
I suggest you continue with. Please make sure to use the URL that points to the util installation:
curl -X POST 'http://nomad-lab.eu/prod/v1/util/api/v1/uploads?token=<your-personal-token>' -T Pulling.zip
I imagine the whole one by one process is very painful. Please just upload without waiting for the other uploads to process. The processing is queued anyways. It is more important to get all your files uploaded. I can still fix the processing of individual uploads, once everything has been uploaded. The “util” installation has a higher threshold for unpublished uploads; you can upload everything before publishing.
Let me know, when you uploaded everything. I will then fix any processing issues. And then you can publish everything. Btw. do you have a deadline for publishing? I imagine you might need a DOI for something?
@mscheidgen thanx for help, yes you can delete this other upload on “slower NOMAD”.
I started uploading next file, when it is uploaded I will upload next one and so on.
Right now there is no deadline for publishing, but it would be good if I could finish upload this week because this is my last week at this institute :).
Of course I can connect to upload remotely or come again sometimes but it is not convenient.