CompositionToOxidComposition hangs

Using get_preset_config(‘express’) or get_preset_config(‘heavy’) with [‘structure’] results in AutoFeaturizer successfully running StructureToOxidComposition and StructureToComposition, then hanging on CompositionToOxidComposition with massive memory usage. This occurs despite my attempts to exclude it from AutoFeaturizer:

from automatminer.featurization import AutoFeaturizer
from automatminer import get_preset_config, MatPipe

config = get_preset_config('express')
config['autofeaturizer'] = AutoFeaturizer(preset='express', structure_col='structure', exclude=['CompositionToOxidComposition']) # attempt to exclude this feature
print(config['autofeaturizer'].featurizers['composition'][1] )

OxidationStates(stats=[‘minimum’, ‘maximum’, ‘range’, ‘std_dev’])

del config['autofeaturizer'].featurizers['composition'][1] # attempt 2 to exclude this feature
pipe = MatPipe(**config)
pipe.fit(train_DF, target=target_col)

2020-10-29 10:36:33 INFO Problem type is: regression
INFO:automatminer:Problem type is: regression
2020-10-29 10:36:33 INFO Fitting MatPipe pipeline to data.
INFO:automatminer:Fitting MatPipe pipeline to data.
2020-10-29 10:36:33 INFO AutoFeaturizer: Starting fitting.
INFO:automatminer.featurization.core:AutoFeaturizer: Starting fitting.
2020-10-29 10:36:33 INFO AutoFeaturizer: composition column already exists, overwriting with composition from structure.
INFO:automatminer.featurization.core:AutoFeaturizer: composition column already exists, overwriting with composition from structure.
2020-10-29 10:36:33 INFO AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
INFO:automatminer.featurization.core:AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
StructureToOxidStructure: 100%
36609/36609 [00:12<00:00, 3050.23it/s]

StructureToComposition: 100%
36609/36609 [00:10<00:00, 3340.16it/s]

2020-10-29 10:36:59 INFO AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
INFO:automatminer.featurization.core:AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
CompositionToOxidComposition: 28%
10231/36609 [00:03<4:26:53, 1.65it/s]

(It then stays at that level indefinitely)

This is run in Jupyterhub, automatminer version 1.0.3.20200727

So, two questions:

  1. Am I trying to remove this feature correctly? Why can’t I remove it?
  2. Why is it trying to infer oxidation states for the composition if it’s already done so for the structure? (I don’t want to pass guess_oxistates=false because I want some oxidation states if possible). I saw this, so I’m confused why I’m running into this problem.

Hey there,

So OxidationStates and CompositionToOxidComposition are actually different featurizers with different purposes. Composition to OxidComposition is done automatically by automatminer regardless of whether OxidationStates is in the featurizer set. This can be disabled with the guess_oxistates argument to AutoFeaturizer

From the docstring:

guess_oxistates (bool): If True, try to decorate sites with oxidation
            state.

Regardless though, this behavior is weird for two reasons:

  1. why it is trying again to add oxistates if the structures already have them
  2. why is it hanging

Could you share a small sample of your data so I can debug on my side? Maybe you could open an issue on the repo so I don’t lose track of this issue here?

Thanks,
Alex

Hi Alex,

Here’s a colab link replicating the problem with a toy dataset.

Thanks,
Kirby