The "Tabular Data" example upload not working?

fabian_li · July 16, 2024, 2:24pm

Hi all,

From the example uploads (right next to Create a new Upload) I chose the tabular data example. The entry should show some chemical elements and some info about them, but it’s basically empty.
Could you fix this example upload so I can adapt its schema to my own tabular files? I’ve already tried adapting the examples from the official docs Schemas and plugins - Documentation but that also didnt work/didnt parse.
Best
Fabian

amgo · July 16, 2024, 2:58pm

Hi Fabian,

Indeed the tabular data is broken. I will look into it. Meanwhile you may check the docs here for writing custom schemas for tabular data as well.

Cheers,
Amir

fabian_li · July 19, 2024, 7:14am

Thank you very much. Just to clarify: the Tabular Data example upload is not my actual interest, my actual interest is to parse tabular files with a schema in general. To do this I’ve tried out some examples from the docs, like Schemas and plugins - Documentation and Parse tabular data - Documentation , but no examples worked (nothing gets parsed), just like the Tabular Data example upload doesn’t parse anything.
So instead of fixing the Tabular Data example upload, I just need to know how to parse the following reactions.xlsx file:

Chemical Reaction         Energy in eV
A + B --> C                     1
D + E --> F                     2

So just 3 rows and 2 columns. Whether it gets parsed in column or row mode is not important. In which section the parsed info gets stored is also not important, it could be e.g in a custom subsection called Chemical Reactions under run.calculation. or under data.

It would be great if you could help me write this yaml schema. Thank you in advance!
Best
Fabian

amgo · July 23, 2024, 9:51am

Hi Fabian,

Just got back from vacation. I cannot attach any files here but here is a simple yaml schema snippet that I just tried to parse a data.xlsx which has two columns Chemical Reaction and Energy in eV:

 name: 'test tabular'
 sections:
   Test_Tabular:
     base_sections:
      - nomad.datamodel.data.EntryData
      - nomad.parsing.tabular.TableData
     quantities:
       data_file:
         type: str
         default: data.xlsx
         m_annotations:
           tabular_parser:
             parsing_options:
               comment: '#'
             mapping_options:
               - mapping_mode: column
                 file_mode: current_entry
                 sections:
                   - '#root'
           browser:
             adaptor: RawFileAdaptor
           eln:
             component: FileEditQuantity
       Chemical_Reaction:
         type: str
         shape: ['*']
         m_annotations:
           tabular:
             name: "Chemical Reaction"
       Energy:
         type: np.float64
         shape: ['*']
         m_annotations:
           tabular:
             name: "Energy in eV"
data:
 m_def: Test_Tabular

and works fine. Please let me know how it goes or if you have any trouble to parse them.

Cheers
Amir

fabian_li · July 23, 2024, 12:43pm

Thanks for your help, I hope you had a nice vacation.

If you click on “reply” there’s a button “upload”. This is my data.xlsx file:
data.xlsx (5.1 KB)

If I upload my data.xlsx together with the yaml schema you gave above I get a failed upload with this error:
Unexpected error: “Could not parse YAML Schema defined here: /uploads/78X5vELYQTaSJ2No_1HGqQ/raw/reactions_schema.archive.yaml. Failed with the following parsing error: All collection items must start at the same column”. Please try again and let us know, if this error keeps happening.

amgo · July 23, 2024, 1:14pm

Thanks!

I have tested it in both our production and staging deployments and both parsed the excel file just fine.

The error you are getting does not tell much about the source of the problem you are facing. If you are running on a local Oasis then you can check for more info on why the processing fails in the LOGS tab of the failed Entry. Let me know what is logged there. Also in the meantime, you can try the central prod/staging deployments to see if they fails there.

Cheers,
Amir

fabian_li · July 23, 2024, 4:31pm

It works now, thank you very much. The error message above was due to indentation errors when I copied your schema code into my VSCode. After that, there came a new error about some name attribute missing, but this could be solved by adding definitions: above name: 'test tabular' in the schema and indenting the name:.

Best
Fabian

fabian_li · July 24, 2024, 7:27pm

Sry to bother again, but strangely it no longer works. I use the exact same data.xlsx that I’ve uploaded above, and I use your code but with added definitions: at the very top and correct indentation, this worked yesterday (on the production server), but today it works neither on production nor on staging?!? NOMAD processing says “success”, but in the entry I get a red banner saying:
Unexpected error: "[object Object] (500)". Please try again and let us know, if this error keeps happening.
Refreshing/reloading doesn’t fix it, there are no errors in the logs, and clicking on the data tab simply loads forever. I was on Linux yesterday when it worked, and today I’m on Windows but this shouldnt make a difference…
If you are able to parse the data.xlsx, could you please upload your .archive.yaml instead of posting it so I can rule out copy&paste and indentation errors?
Best
Fabian

amgo · August 1, 2024, 9:14am

Hi Fabian,

Indeed yesterday it was not working but it should be working now.
Can you please give it another try? staging and official

Sorry again, since my account is new, matsci does not allow me to upload any documents at the moment.

Cheers
Amir

fabian_li · August 20, 2024, 3:13pm

Hi Amir,

thanks for your help, it works now. However, I would like this upload to be findable by keywords/quantity values such as SEI diffusion under the general search field. Currently, I have to go to Explore and then ELN, where I can find the upload by searching for SEI diffusion. But if I search with the general search field under Explore and then Entries, the word SEI diffusion only gives “Unsupported query”. I’ve already removed the lines

eln:
    Component: FileEditQuantity

from the archive.yaml file, but NOMAD still seems to classify the upload as ELN (because it’s still findable under Explore then ELN). Is there a way to change this?
Best
Fabian

fabian_li · September 5, 2024, 11:29am

Hi @amgo ,

thanks for your help, it works now. However, I would like this upload to be findable by keywords/quantity values such as SEI diffusion under the general search field. Currently, I have to go to Explore and then ELN, where I can find the upload by searching for SEI diffusion. But if I search with the general search field under Explore and then Entries, the word SEI diffusion only gives “Unsupported query”. I’ve already removed the lines

eln:
    Component: FileEditQuantity

from the archive.yaml file, but NOMAD still seems to classify the upload as ELN (because it’s still findable under Explore then ELN). Is there a way to change this?
Best
Fabian

amgo · September 6, 2024, 1:14pm

Hi Fabian,

Glad that it has worked. Unfortunately the central nomad currently supports searching of the quantities from custom schema only under the ELN section (those quantities that are internally saved under data section). I will try to see if there is any other way to search for your data in the global search but in the meantime, you may define a customized app for your own use-case in your local oasis (Write an app - Documentation).

Cheers
Amir

Please let me