export and warm_start in automatminer

Dear developers,
I found the tool automatminer extremely interesting and would like to use it in our own research.
I understand that the code is running a TPOT library backend to search the model space. TPOT provides some very convenient functions which we’d like to use but were unable to achieve using automatminer. The first one is the ‘export’ function (e.g. pipeline_optimizer.export(‘tpot_exported_pipeline.py’)) which writes out the optimized pipeline to a simple python script and the 2nd one is the restart function (‘warm_start’) which allows the user to ‘pause’ the computation at a certain point and again restart it from the same. Could you kindly let me know how can we access these two functions from the automatminer interface? We’d like to export the most optimized (or two) pipeline to a python script and would like the pause and restart the optimization process at our will, as the ‘production’ level optimization can take quite a while, even with small datasets.
Looking forward to hear from you.

Regards,

Arnab

Hey Arnab,

Thanks for your interest!

If you are looking for just the backend parts of tpot, you can access the raw tpot object in an automatminer pipeline through the MatPipe.learner.backend attribute. This is a raw TPOTClassifier/TPOTRegressor object and should have all API components in the regular TPOT library (completely separate from automatminer). From this object you should be able to export the optimized pipeline.

For pausing a pipeline, this is something that is a current issue on the Github repo, but we haven’t gotten around to fixing yet. If you have an idea for how best to go about this, we welcome code contributions, please make a PR!

···

On a separate note, as automatminer is still in an experimental phase (especially in regard to presets), we are not making any guarantees about the “production” preset. While it should work well enough, if it doesn’t (or if there is something you want to change about it, i.e., the optimization time), I’d recommend specifying your own custom MatPipe. We have some docs and tutorials for that in the works, but in the meantime if you need help with that, feel free to ask on this board.

Thanks,

Alex

Note that for “Export”, what I described will only export the TPOT ML part of the pipeline, not the featurization or feature reduction etc. We have not yet added an “Export” functionality for full MatPipe pipelines, but I just added an issue for it on the repo.

···

On Friday, September 13, 2019 at 12:20:52 PM UTC-7, Alexander Dunn wrote:

Hey Arnab,

Thanks for your interest!

If you are looking for just the backend parts of tpot, you can access the raw tpot object in an automatminer pipeline through the MatPipe.learner.backend attribute. This is a raw TPOTClassifier/TPOTRegressor object and should have all API components in the regular TPOT library (completely separate from automatminer). From this object you should be able to export the optimized pipeline.

For pausing a pipeline, this is something that is a current issue on the Github repo, but we haven’t gotten around to fixing yet. If you have an idea for how best to go about this, we welcome code contributions, please make a PR!


On a separate note, as automatminer is still in an experimental phase (especially in regard to presets), we are not making any guarantees about the “production” preset. While it should work well enough, if it doesn’t (or if there is something you want to change about it, i.e., the optimization time), I’d recommend specifying your own custom MatPipe. We have some docs and tutorials for that in the works, but in the meantime if you need help with that, feel free to ask on this board.

Thanks,

Alex