Uploads page keeps redownloading the uploads info

Pavel_Ondracka · May 26, 2021, 8:23am

While trying to test upload to central nomad as part of bug "Publish upload to central NOMAD" does not work. · Issue #17 · nomad-coe/nomad · GitHub I’ve observed that my uploads page in our oasis is not working properly. Some older published uploads do not show the “published” globe icon (even though they are published), but rather just a rotating circle and the page keeps asking the server for the uploads data (and also the upload to central nomad button is gray/inactive for such uploads):

134.130.64.226 - - [26/May/2021:07:27:47 +0000] "GET /nomad-oasis/api/uploads/VFiP3BXJTwyVpAVgpjv31w?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 186727 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"
134.130.64.226 - - [26/May/2021:07:27:47 +0000] "GET /nomad-oasis/api/uploads/OXk6jjQnQSmXwH-ug_DTsQ?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 162653 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"
134.130.64.226 - - [26/May/2021:07:27:47 +0000] "GET /nomad-oasis/api/uploads/kAXXbFFGQ--8m7u0QsZt0w?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 50518 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"
2021/05/26 07:27:47 [warn] 7#7: *12665 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/6/50/0000000506 while reading upstream, client: 134.130.64.226, server: oasis.mch.rwth-aachen.de, request: "GET /nomad-oasis/api/uploads/JbdlUTi8TTavwlUXDlro2A?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1", upstream: "http://172.26.0.6:8000/nomad-oasis/api/uploads/JbdlUTi8TTavwlUXDlro2A?page=1&per_page=10&order_by=tasks_status&order=1", host: "oasis.mch.rwth-aachen.de", referrer: "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads"
134.130.64.226 - - [26/May/2021:07:27:47 +0000] "GET /nomad-oasis/api/uploads/W6pTuvAiSV-cfEd2cuLINA?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 199354 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"
134.130.64.226 - - [26/May/2021:07:27:47 +0000] "GET /nomad-oasis/api/uploads/H9yxAWF2TamhzKJCJrbbXQ?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 51051 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"
134.130.64.226 - - [26/May/2021:07:27:47 +0000] "GET /nomad-oasis/api/uploads/JbdlUTi8TTavwlUXDlro2A?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 185859 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"
134.130.64.226 - - [26/May/2021:07:27:47 +0000] "GET /nomad-oasis/api/uploads/M2paJhHaRlKGx7DQaQHf2g?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 176592 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"
134.130.64.226 - - [26/May/2021:07:27:48 +0000] "GET /nomad-oasis/api/uploads/VFiP3BXJTwyVpAVgpjv31w?page=1&per_page=10&order_by=tasks_status&order=1 HTTP/1.1" 200 186727 "https://oasis.mch.rwth-aachen.de/nomad-oasis/gui/uploads" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0" "-"

This is repeating over and over, with Firefox busy at two cores (and the server app container also having around 25% CPU utilization). I’m not opening a bug yet, as it is possible there might be some issue with the database (as I did some non-standard stuff previously How to start re-processing of undetected entries, so this might be some fallout)?

This is with an Oasis running latest v0.10.4.

mscheidgen · May 26, 2021, 8:39am

Even if “publish upload to central NOMAD” is not successful, it should not cause the processing to get stuck. This is a bug. From the code, I am not sure how this could happen though.

You can reset the processing state of an upload from the CLI:

nomad admin uploads reset -- <upload-id>

This should remove the spinning wheel and allow to issue new actions on the upload.

Pavel_Ondracka · May 26, 2021, 8:43am

Actually this is visible for all my older uploads, not only for the one where I tried the upload to central NOMAD. So something else triggered this, not the upload to central nomad itself. Its just that I noticed this for the one where I tried upload to central NOMAD previously

Pavel_Ondracka · May 26, 2021, 9:07am

Resetting the upload state does not help to get rid of the processing state, it just now says processed 0/107, but the upload is still published and all the entries are already parsed (see the attached screenshot:

).
Should I also re-process?

mscheidgen · May 26, 2021, 10:11am

Maybe something went wrong in the re-processing. It would be good if we could reproduce this scenario. The processing is usually quite good at handling errors and not getting into this state. However, the “discovering new entries” scenario isn’t quite new and not super well tested.

Can you also try:

nomad admin uploads reset --with-calcs -- <upload-id>

This will also reset the underlying calculations, not just the upload itself. It looks as if all calcs were marked for reprocessing, but then never actually processed (hence 0/107).

If this is not working, we will need to look into your mongodb to debug some more.

Pavel_Ondracka · May 26, 2021, 10:43am

OK, so re-processing the upload with nomad admin uploads re-process -- <upload-id> fixes the upload state, so it shows “published” globe icon. I also found out what causes this bad state. It is the re-pack command. I did run mass “nomad admin uploads re-pack” on the whole database a while ago after the global reprocessing and new entries discovery, as discussed and suggested in How to start re-processing of undetected entries - #17 by mscheidgen

If I run re-pack on some published test upload, it will get switched to this “processing” state.

mscheidgen · May 26, 2021, 11:12am

Thanks for figuring this out. I open a bug report and we will investigate the re-pack some more.

mscheidgen · May 27, 2021, 1:44pm

The CLI re-pack command was resetting the processing (including setting tasks status to pending). The CLI was doing this before all upload based processing operations. This is correct for full re-processing of an upload (this will iterate through all the tasks starting with pending), but not for re-packing and similar operations that don’t use the tasks system.

Fix merged into v0.10.4, will be part of next official release.

Pavel_Ondracka · May 27, 2021, 3:21pm

The fix is for re-pack to not break it again, right? However, to fix the current already bad state, is there some simple way or do I need to run re-process on everything?

mscheidgen · May 28, 2021, 7:59am

You are right. It is just to prevent the same thing happening in the future.

I just merged a fix into v0.10.4 that adds more parameters to nomad admin uploads reset. You should do reset --with-calcs --success on the uploads in question. Unfortunately, we probably have lost some information about failed calculations. There was also a bug and the reset command was applying the reset to calculations all the time. Setting all calculation to pending state as well. The calculation processing failed/success state only appears on an upload’s list of entries. It wont effect the search, publication, etc. If the correct state is important to you, you will need to reprocess (and thereby reproduce parser failures, etc.).

Pavel_Ondracka · May 28, 2021, 8:23am

OK, thanks. Yeah, I’ll probably do a global re-process soon to fix everything. I have a new parser in the work so I planned to do it anyway.