Hi,
I have a few questions about using additional stores in the JobFlow API.
Question 1
What the are possible ways to map Job outputs to their respective stores? The pattern I’m aware of from the documentation and what I’ve implemented on my own is to have a Job return a dictionary with keys that map values to specific stores. For example,
def my_job(...):
...
return {
"doc_store": task_document,
"trajectories": task_document.calculation_output.dcd_reports
}
Job(
method=my_task,
trajectories="trajectories",
)
The Job documentation for the kwargs states “The argument name gives the additional store name and the argument value gives the type of data to store in that additional store.”. When I read this, I parsed it as saying that I could return a nested schema and just specify the data type of the objects within the nested schema that would get set to the additional store. For example,
def my_task(...):
...
return task_document
Job(
method=my_task,
trajectories=DCDRecports,
)
where TaskDocument
is a pydantic data model that points to another pydantic data model named CalculationOutput
that points to DCDReports
which is a pydantic data model that I want to store in the additional store. I had tried this, but it didn’t seem to work, so I wanted to check if I was misunderstanding the documentation or if I was doing something wrong.
Question 2
When using additional stores, how do I make use of the output_schema attribute of the Job? The documentation states the output_schema
of the Job
class is of type BaseModel
. Assuming that you need to return a dictionary to make use of additional stores (e.g. the first example code above), how is it possible to make use of the output_schema validation expecting a BaseModel
type when the Job is returning a dict
type to map data to additional storages.
Question 3
I have a Flow
consisting of a list of Jobs
and one Flow
(e.g. Flow.jobs = [Job, Job, Flow, Job, Job]). All of the Jobs have been configured to write data to an additional store and it works as expected, except for the Jobs within the nested Flow
. For example, I’ve written a unit tests here, where I’ve commented out tests that are failing because I’m expecting Job output put into the document store to be linked with data put into the additional store via a uuid. The behavior I’m observing is the Job
s from the nested Flow
are not utilizing the additional store. Is this behavior expected or am I doing something wrong?
Thank you in advance!