Workflow for PDF processing

Hi,

I have experience in Python programming and I’m starting now to learn Fireworks. I need to develop a workflow to analyze PDF files. I have a command-line tool that generates a XML report about the PDFs files. I have to analyze this report for errors and compare the generated XML file with another XML file that store some parameters. If there are errors, I have to send a mail with the found errors. If everything is ok, I generate a report and copy the PDF file to an “OK” folder.

My questions are:

  1. I believe I need to create custom Firetasks to analyze the XML files and perform the PDF analysis, the “OK” and “Not OK” actions. I understood that the file containing the the custom FireTasks must be reachable by Python and Fireworks. Doing that, I can make a job template for this task with placeholders for the path of the PDFs and XML files and submit a job. That’s correct?

  2. The user will copy the PDFs and XML file to a hot folder and I’ll use the Python watchdog module to catch these files and copy to a temporary folder. After that I’d like to somehow queue a job to be processed that will run only after the previous job concludes. How can I make this queue? Is a good idea to store the queue in the MongoDB?

Best regards,

Lorenzo

Hi Lorenzo

Regarding (1), yes the custom FireTask would be the best route. There is a guide to help:

https://pythonhosted.org/FireWorks/guide_to_writing_firetasks.html

As for the locations of the PDF and XML files, they can be parameters that are input to the FireTask constructor or they can be set as part of the spec.

An alternate way would be to use a ScriptTask to run some external executable script that does the processing, e.g. with input flags for the input and output files. Then FireWorks is really just being used to call this script.

Regarding (2), without knowing too many details, it sounds like the best thing is to actually code a workflow that has all the dependent jobs. i.e., if you code a workflow where the downstream Fireworks are children to the upstream Fireworks, the downstream ones will not run until the upstream ones are finished. If you go through some of the workflow tutorials it will show how this is done, including various states like WAITING for the parent to complete. Thus I would suggest setting up an actual workflow with all the dependencies. After that, you’ll still need to either (i) run rlaunch/qlaunch in a rapidfire infinite loop so that we periodically check if new jobs are ready to run and launch them or (ii) use watchdog to run rlaunch/qlaunch singleshot when it notices that a job should be done.

Best

Anubhav

···

On Fri, Mar 24, 2017 at 9:23 AM, Lorenzo Ridolfi [email protected] wrote:

Hi,

I have experience in Python programming and I’m starting now to learn Fireworks. I need to develop a workflow to analyze PDF files. I have a command-line tool that generates a XML report about the PDFs files. I have to analyze this report for errors and compare the generated XML file with another XML file that store some parameters. If there are errors, I have to send a mail with the found errors. If everything is ok, I generate a report and copy the PDF file to an “OK” folder.

My questions are:

  1. I believe I need to create custom Firetasks to analyze the XML files and perform the PDF analysis, the “OK” and “Not OK” actions. I understood that the file containing the the custom FireTasks must be reachable by Python and Fireworks. Doing that, I can make a job template for this task with placeholders for the path of the PDFs and XML files and submit a job. That’s correct?
  1. The user will copy the PDFs and XML file to a hot folder and I’ll use the Python watchdog module to catch these files and copy to a temporary folder. After that I’d like to somehow queue a job to be processed that will run only after the previous job concludes. How can I make this queue? Is a good idea to store the queue in the MongoDB?

Best regards,

Lorenzo

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

Visit this group at https://groups.google.com/group/fireworkflows.

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/de1ec589-bb76-4991-b80b-8bf652d5aec9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Best,
Anubhav