Archiving whole uploads

Dear nomad developers,

one thing which doesn’t fit our current Oasis needs too well is the treatment of the uploads. If I understand this process correctly than sudirectories are scanned recursively and the contents are only saved if there is a mainfile detected and successfully parsed.

What we would need is to save the whole upload (our users for example keep sometime excel sheets with some calculations done on top of the DFT results, some figures generated from the data, or random bash scripts used to generate the inputs and so on, and they are usually in some top-level directory where the is no DFT calculaition and is therefore not saved). Also this will be very helpful for us for some DFT or other data which we want to eventually parse but there are no parsers yet, so that it is saved and could be reparsed later after we write the corresponding parsers.

I do understand that this is not optimal for the central NOMAD as this could be easily misused and will use more space, but I believe such feature could make sense for Oasis (maybe as an opt-in configuration switch), with stricter access restrictions.

Is this something which could be achieved somehow with the current NOMAD Oasis version?

Thank you for your input. Unfortunately, there is not much that we have at the moment. As you already know, only files in directory of DFT calculations are offered for download.

The files are saved. All uploaded files and their directory structure is preserved. You can use the API to access the files, if you know exactly what these files are. There is “just” no GUI to show it.

We are currently expanding the upload API, adding a files browser to the UI, and incremental uploads. But this will take some time and has to wait for the next major release. As part of this development, we will implement a basic convenience file browser as part of the API (like a simple file access HTTP server). We could link this with the currently GUI and provide it as an intermediate solution at an earlier point.

OK, this is actually better than I thought, so long as everything you upload is saved, it should be possible to rerun the parsers when we have the new ones (to regenerate the metadata) and also all the uploaded files would be still accessible, right?

While some file browser would be nice, we don’t need anything as complicated. Right now when you search for an entry and then select it you have the option to download either the “raw uploaded files” or “NOMAD archive files”, what would be sufficient for us is a third option, something like “whole raw upload” to really download the whole archive with the selected entry as it was uploaded by the original user (including all of the other DFT entries and random files inside).

Yes exactly. With new parsers or changed, the uploads can be reprocessed to discover new files or update existing archives.

There is an upload download button on the “Your Data” page. But this won’t help you to download the files of others. We could add a “download whole upload” button to the raw files entry page. I don’t think, we can will add it on search results though. Search results might span many uploads and for the user it becomes very hard to ascertain how much data she is trying to download.

The download button on the “Your Data” page actually doesn’t do what we need. If I click on “download upload” and than select “raw uploaded files”, it will contain all of the uploaded detected DFT cases. However, directories which contained some other files in the original upload, but no detected mainfile are empty.

Adding a “download whole upload” button to the raw files entry page would be appreciated, but we would need it to really download the “whole upload”, not just the directories with detected mainfiles as above.

Yes, sorry. I was mistaken. The button does a search to get all entries of the upload and then you have the same undesired behaviour. So we will try to change this and add the other button (with the right semantics) in our next patch release.