Rerunning normalizers for whole database?

I was trying to figure out why 90% of structures in our oasis has “unavailable” system type and traced this down to normalize - system_classification_with_clusters_threshold settings.

I increased it a bit and now I would like to rerun the normalizers for the whole database to get the correct system type for the previously uploaded structures. How can I do this?

There is no way to just redo the normalisation, you will also have to parse again.

You can “hop” into the app container (docker exec -ti nomad_oasis_app /bin/bash) and use our command line interface to trigger a reprocessing. nomad admin uploads re-process would reprocess everything. You can try --help on the uploads and re-process sub-commands to find more useful options to filter for certain uploads or process uploads in parallel. I would recommend to start with single upload to get more comfortable: nomad admin uploads re-process -- <upload_id>.

These aspects of NOMAD aren’t well documented yet. I guess with increased oasis usage, we have to do our homework here.

Thanks @mscheidgen, is there some way how to monitor the re-processing when started from the command line, i.e., to be able to say what is it parsing at the moment? It looks like it is processing some entries for more than hour already, but I have no idea which one (I had success with reprocessing some smaller uploads but now its running for upload with 10000 entries).

Also is it OK to stop (kill) the reprocessing command if stuck for a long time (I hope this doesn’t have any bad effect for the database)? I have tried “nomad admin uploads stop” with no luck.

Let’s say, you did not use the --parallel parameter. The re-process command will trigger the reprocessing of one upload, wait until complete (then you get some print out), then trigger the next… If you use the --parallel parameter, it is triggering multiple uploads at the same time. In any case, if you abort the re-process, you only abort this waiting/triggering. The uploads that are already reprocessing, will continue to do so.

You can use the nomad admin uploads --processing ls command to see what uploads are still processing. You can do this in a separate shell for example.