Using get_preset_config(‘express’) or get_preset_config(‘heavy’) with [‘structure’] results in AutoFeaturizer successfully running StructureToOxidComposition and StructureToComposition, then hanging on CompositionToOxidComposition with massive memory usage. This occurs despite my attempts to exclude it from AutoFeaturizer:
from automatminer.featurization import AutoFeaturizer
from automatminer import get_preset_config, MatPipe
config = get_preset_config('express')
config['autofeaturizer'] = AutoFeaturizer(preset='express', structure_col='structure', exclude=['CompositionToOxidComposition']) # attempt to exclude this feature
print(config['autofeaturizer'].featurizers['composition'][1] )
OxidationStates(stats=[‘minimum’, ‘maximum’, ‘range’, ‘std_dev’])
del config['autofeaturizer'].featurizers['composition'][1] # attempt 2 to exclude this feature
pipe = MatPipe(**config)
pipe.fit(train_DF, target=target_col)
2020-10-29 10:36:33 INFO Problem type is: regression
INFO:automatminer:Problem type is: regression
2020-10-29 10:36:33 INFO Fitting MatPipe pipeline to data.
INFO:automatminer:Fitting MatPipe pipeline to data.
2020-10-29 10:36:33 INFO AutoFeaturizer: Starting fitting.
INFO:automatminer.featurization.core:AutoFeaturizer: Starting fitting.
2020-10-29 10:36:33 INFO AutoFeaturizer: composition column already exists, overwriting with composition from structure.
INFO:automatminer.featurization.core:AutoFeaturizer: composition column already exists, overwriting with composition from structure.
2020-10-29 10:36:33 INFO AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
INFO:automatminer.featurization.core:AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
StructureToOxidStructure: 100%
36609/36609 [00:12<00:00, 3050.23it/s]
StructureToComposition: 100%
36609/36609 [00:10<00:00, 3340.16it/s]
2020-10-29 10:36:59 INFO AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
INFO:automatminer.featurization.core:AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
CompositionToOxidComposition: 28%
10231/36609 [00:03<4:26:53, 1.65it/s]
(It then stays at that level indefinitely)
This is run in Jupyterhub, automatminer version 1.0.3.20200727
So, two questions:
- Am I trying to remove this feature correctly? Why can’t I remove it?
- Why is it trying to infer oxidation states for the composition if it’s already done so for the structure? (I don’t want to pass guess_oxistates=false because I want some oxidation states if possible). I saw this, so I’m confused why I’m running into this problem.