Passing a numpy array to a Firetask

Hi,

I’d like to pass a numpy array to a custom Firetask through the Firetask’s fw_spec. The dictionary value in the fw_spec which corresponds to my numpy array is converted to the unicode type when it arrives in my Firetask. I could use the numpy.ndarray.tostring() and and numpy.ndarray.fromstring() methods to get around this, but wanted to check whether a cleaner option exists.

Thanks,
Jonathan

Hi Jonathan,

When the FireTask is de-serialized, do you need it reconstructed back into a numpy array or is a Python list OK?

e.g., we could easily modify the FWS serializer so that numpy arrays are stored as lists / nested lists / etc. However, if we did this, when the FireTask is loaded up again, it would load that variable as a list and the FireTask would need a line of code to turn it back into a numpy array. Is this OK or would you prefer that it actually de-serializes into a proper numpy array?

Anubhav

···

On Monday, February 6, 2017 at 10:01:04 AM UTC-8, jkuck wrote:

Hi,

I’d like to pass a numpy array to a custom Firetask through the Firetask’s fw_spec. The dictionary value in the fw_spec which corresponds to my numpy array is converted to the unicode type when it arrives in my Firetask. I could use the numpy.ndarray.tostring() and and numpy.ndarray.fromstring() methods to get around this, but wanted to check whether a cleaner option exists.

Thanks,
Jonathan

Hi Anubhav,

I realized that the tostring()/fromstring() methods don’t handle multidimensional arrays, so I followed your recommended procedure of converting the numpy array to a list of lists and then back to an array. I think this is fine. However, what would the recommended procedure be if I’d like to pass an object of arbitrary type, for instance of a class I’ve written myself? Should I write tostring() and fromstring() methods myself or does Fireworks have built in functionality to handle this?

Thanks,

Jonathan

···

On Monday, February 6, 2017 at 10:10:49 AM UTC-8, Anubhav Jain wrote:

Hi Jonathan,

When the FireTask is de-serialized, do you need it reconstructed back into a numpy array or is a Python list OK?

e.g., we could easily modify the FWS serializer so that numpy arrays are stored as lists / nested lists / etc. However, if we did this, when the FireTask is loaded up again, it would load that variable as a list and the FireTask would need a line of code to turn it back into a numpy array. Is this OK or would you prefer that it actually de-serializes into a proper numpy array?

Anubhav

On Monday, February 6, 2017 at 10:01:04 AM UTC-8, jkuck wrote:

Hi,

I’d like to pass a numpy array to a custom Firetask through the Firetask’s fw_spec. The dictionary value in the fw_spec which corresponds to my numpy array is converted to the unicode type when it arrives in my Firetask. I could use the numpy.ndarray.tostring() and and numpy.ndarray.fromstring() methods to get around this, but wanted to check whether a cleaner option exists.

Thanks,
Jonathan

Hi Jonathan

The recommendation is that for most data types, the serialization/deserialization should be automatic. For numpy arrays, I will try to push something this Friday that at least gets them working (automatically) via auto conversion to nested lists. One of the packages that FireWorks depends on, monty, has a tool for this already so I will just use that.

For the more general version of the question, there are two ways to get serialization/deserialization working properly:

  1. Make sure that any non-primitive objects you put into a FireTask have a to_dict() and from_dict() method (i.e., are FWSerializable).

or

  1. For your custom FireTask, implement your own to_dict() and from_dict() method for that overall FireTask - i.e., instead of inheriting those methods from FireTaskBase. This would allow you to not have to create those methods for individual objects.

But again, the expectation is that for common data types like numpy arrays, that FireWorks will take care of it for you. We just didn’t implement it yet but hopefully can do it soon.

···

On Monday, February 6, 2017 at 10:29:27 AM UTC-8, jkuck wrote:

Hi Anubhav,

I realized that the tostring()/fromstring() methods don’t handle multidimensional arrays, so I followed your recommended procedure of converting the numpy array to a list of lists and then back to an array. I think this is fine. However, what would the recommended procedure be if I’d like to pass an object of arbitrary type, for instance of a class I’ve written myself? Should I write tostring() and fromstring() methods myself or does Fireworks have built in functionality to handle this?

Thanks,

Jonathan

On Monday, February 6, 2017 at 10:10:49 AM UTC-8, Anubhav Jain wrote:

Hi Jonathan,

When the FireTask is de-serialized, do you need it reconstructed back into a numpy array or is a Python list OK?

e.g., we could easily modify the FWS serializer so that numpy arrays are stored as lists / nested lists / etc. However, if we did this, when the FireTask is loaded up again, it would load that variable as a list and the FireTask would need a line of code to turn it back into a numpy array. Is this OK or would you prefer that it actually de-serializes into a proper numpy array?

Anubhav

On Monday, February 6, 2017 at 10:01:04 AM UTC-8, jkuck wrote:

Hi,

I’d like to pass a numpy array to a custom Firetask through the Firetask’s fw_spec. The dictionary value in the fw_spec which corresponds to my numpy array is converted to the unicode type when it arrives in my Firetask. I could use the numpy.ndarray.tostring() and and numpy.ndarray.fromstring() methods to get around this, but wanted to check whether a cleaner option exists.

Thanks,
Jonathan

Hi Anubhav,

Got it, thanks a lot!

Jonathan

···

On Monday, February 6, 2017 at 12:49:07 PM UTC-8, Anubhav Jain wrote:

Hi Jonathan

The recommendation is that for most data types, the serialization/deserialization should be automatic. For numpy arrays, I will try to push something this Friday that at least gets them working (automatically) via auto conversion to nested lists. One of the packages that FireWorks depends on, monty, has a tool for this already so I will just use that.

For the more general version of the question, there are two ways to get serialization/deserialization working properly:

  1. Make sure that any non-primitive objects you put into a FireTask have a to_dict() and from_dict() method (i.e., are FWSerializable).

or

  1. For your custom FireTask, implement your own to_dict() and from_dict() method for that overall FireTask - i.e., instead of inheriting those methods from FireTaskBase. This would allow you to not have to create those methods for individual objects.

But again, the expectation is that for common data types like numpy arrays, that FireWorks will take care of it for you. We just didn’t implement it yet but hopefully can do it soon.

On Monday, February 6, 2017 at 10:29:27 AM UTC-8, jkuck wrote:

Hi Anubhav,

I realized that the tostring()/fromstring() methods don’t handle multidimensional arrays, so I followed your recommended procedure of converting the numpy array to a list of lists and then back to an array. I think this is fine. However, what would the recommended procedure be if I’d like to pass an object of arbitrary type, for instance of a class I’ve written myself? Should I write tostring() and fromstring() methods myself or does Fireworks have built in functionality to handle this?

Thanks,

Jonathan

On Monday, February 6, 2017 at 10:10:49 AM UTC-8, Anubhav Jain wrote:

Hi Jonathan,

When the FireTask is de-serialized, do you need it reconstructed back into a numpy array or is a Python list OK?

e.g., we could easily modify the FWS serializer so that numpy arrays are stored as lists / nested lists / etc. However, if we did this, when the FireTask is loaded up again, it would load that variable as a list and the FireTask would need a line of code to turn it back into a numpy array. Is this OK or would you prefer that it actually de-serializes into a proper numpy array?

Anubhav

On Monday, February 6, 2017 at 10:01:04 AM UTC-8, jkuck wrote:

Hi,

I’d like to pass a numpy array to a custom Firetask through the Firetask’s fw_spec. The dictionary value in the fw_spec which corresponds to my numpy array is converted to the unicode type when it arrives in my Firetask. I could use the numpy.ndarray.tostring() and and numpy.ndarray.fromstring() methods to get around this, but wanted to check whether a cleaner option exists.

Thanks,
Jonathan

Hi Jonathan

I added a basic numpy to list serializer in FWS v1.4.1 so conversion from numpy->list for MongoDB should be automatic. Unfortunately, the reverse conversion (back from MongoDB to Python) will keep this as a list instead of numpy array.

I wasn’t able to get a good plan to use MontyEncoder from the “monty” package which would do both directions well. At some point we might try reworking FireWorks serialization but this is at least a short-term solution.

Anubhav

···

On Monday, February 6, 2017 at 1:13:24 PM UTC-8, jkuck wrote:

Hi Anubhav,

Got it, thanks a lot!

Jonathan

On Monday, February 6, 2017 at 12:49:07 PM UTC-8, Anubhav Jain wrote:

Hi Jonathan

The recommendation is that for most data types, the serialization/deserialization should be automatic. For numpy arrays, I will try to push something this Friday that at least gets them working (automatically) via auto conversion to nested lists. One of the packages that FireWorks depends on, monty, has a tool for this already so I will just use that.

For the more general version of the question, there are two ways to get serialization/deserialization working properly:

  1. Make sure that any non-primitive objects you put into a FireTask have a to_dict() and from_dict() method (i.e., are FWSerializable).

or

  1. For your custom FireTask, implement your own to_dict() and from_dict() method for that overall FireTask - i.e., instead of inheriting those methods from FireTaskBase. This would allow you to not have to create those methods for individual objects.

But again, the expectation is that for common data types like numpy arrays, that FireWorks will take care of it for you. We just didn’t implement it yet but hopefully can do it soon.

On Monday, February 6, 2017 at 10:29:27 AM UTC-8, jkuck wrote:

Hi Anubhav,

I realized that the tostring()/fromstring() methods don’t handle multidimensional arrays, so I followed your recommended procedure of converting the numpy array to a list of lists and then back to an array. I think this is fine. However, what would the recommended procedure be if I’d like to pass an object of arbitrary type, for instance of a class I’ve written myself? Should I write tostring() and fromstring() methods myself or does Fireworks have built in functionality to handle this?

Thanks,

Jonathan

On Monday, February 6, 2017 at 10:10:49 AM UTC-8, Anubhav Jain wrote:

Hi Jonathan,

When the FireTask is de-serialized, do you need it reconstructed back into a numpy array or is a Python list OK?

e.g., we could easily modify the FWS serializer so that numpy arrays are stored as lists / nested lists / etc. However, if we did this, when the FireTask is loaded up again, it would load that variable as a list and the FireTask would need a line of code to turn it back into a numpy array. Is this OK or would you prefer that it actually de-serializes into a proper numpy array?

Anubhav

On Monday, February 6, 2017 at 10:01:04 AM UTC-8, jkuck wrote:

Hi,

I’d like to pass a numpy array to a custom Firetask through the Firetask’s fw_spec. The dictionary value in the fw_spec which corresponds to my numpy array is converted to the unicode type when it arrives in my Firetask. I could use the numpy.ndarray.tostring() and and numpy.ndarray.fromstring() methods to get around this, but wanted to check whether a cleaner option exists.

Thanks,
Jonathan

Hi Anubhav,

Great, thanks a lot!

Jonathan

···

On Friday, February 10, 2017 at 10:54:47 AM UTC-8, Anubhav Jain wrote:

Hi Jonathan

I added a basic numpy to list serializer in FWS v1.4.1 so conversion from numpy->list for MongoDB should be automatic. Unfortunately, the reverse conversion (back from MongoDB to Python) will keep this as a list instead of numpy array.

I wasn’t able to get a good plan to use MontyEncoder from the “monty” package which would do both directions well. At some point we might try reworking FireWorks serialization but this is at least a short-term solution.

Anubhav

On Monday, February 6, 2017 at 1:13:24 PM UTC-8, jkuck wrote:

Hi Anubhav,

Got it, thanks a lot!

Jonathan

On Monday, February 6, 2017 at 12:49:07 PM UTC-8, Anubhav Jain wrote:

Hi Jonathan

The recommendation is that for most data types, the serialization/deserialization should be automatic. For numpy arrays, I will try to push something this Friday that at least gets them working (automatically) via auto conversion to nested lists. One of the packages that FireWorks depends on, monty, has a tool for this already so I will just use that.

For the more general version of the question, there are two ways to get serialization/deserialization working properly:

  1. Make sure that any non-primitive objects you put into a FireTask have a to_dict() and from_dict() method (i.e., are FWSerializable).

or

  1. For your custom FireTask, implement your own to_dict() and from_dict() method for that overall FireTask - i.e., instead of inheriting those methods from FireTaskBase. This would allow you to not have to create those methods for individual objects.

But again, the expectation is that for common data types like numpy arrays, that FireWorks will take care of it for you. We just didn’t implement it yet but hopefully can do it soon.

On Monday, February 6, 2017 at 10:29:27 AM UTC-8, jkuck wrote:

Hi Anubhav,

I realized that the tostring()/fromstring() methods don’t handle multidimensional arrays, so I followed your recommended procedure of converting the numpy array to a list of lists and then back to an array. I think this is fine. However, what would the recommended procedure be if I’d like to pass an object of arbitrary type, for instance of a class I’ve written myself? Should I write tostring() and fromstring() methods myself or does Fireworks have built in functionality to handle this?

Thanks,

Jonathan

On Monday, February 6, 2017 at 10:10:49 AM UTC-8, Anubhav Jain wrote:

Hi Jonathan,

When the FireTask is de-serialized, do you need it reconstructed back into a numpy array or is a Python list OK?

e.g., we could easily modify the FWS serializer so that numpy arrays are stored as lists / nested lists / etc. However, if we did this, when the FireTask is loaded up again, it would load that variable as a list and the FireTask would need a line of code to turn it back into a numpy array. Is this OK or would you prefer that it actually de-serializes into a proper numpy array?

Anubhav

On Monday, February 6, 2017 at 10:01:04 AM UTC-8, jkuck wrote:

Hi,

I’d like to pass a numpy array to a custom Firetask through the Firetask’s fw_spec. The dictionary value in the fw_spec which corresponds to my numpy array is converted to the unicode type when it arrives in my Firetask. I could use the numpy.ndarray.tostring() and and numpy.ndarray.fromstring() methods to get around this, but wanted to check whether a cleaner option exists.

Thanks,
Jonathan