Autofeaturizer stuck at SineCoulomb Matrix

Hello Matminers!
I’ve tried out Automatminer (1.0.3.20200727) on my data using the ‘express’ setting and on one of the iterations, it got stuck at the SineCoulomb matrix featurizing. Below is the log:

2020-11-06 09:04:45 INFO     Problem type is: regression
2020-11-06 09:04:45 INFO     Fitting MatPipe pipeline to data.
2020-11-06 09:04:45 INFO     AutoFeaturizer: Starting fitting.
2020-11-06 09:04:45 INFO     AutoFeaturizer: Adding compositions from structures.
2020-11-06 09:04:45 INFO     AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 09:05:31 INFO     AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
2020-11-06 09:05:35 INFO     AutoFeaturizer: Will remove YangSolidSolution because it's fraction passing the precheck for this dataset (0.758291153584649) was less than the minimum (0.9)
2020-11-06 09:05:37 INFO     AutoFeaturizer: Will remove Miedema because it's fraction passing the precheck for this dataset (0.758291153584649) was less than the minimum (0.9)
2020-11-06 09:05:37 INFO     AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 09:06:07 INFO     AutoFeaturizer: Will remove GlobalInstabilityIndex because it's fraction passing the precheck for this dataset (0.13738007101306943) was less than the minimum (0.9)
2020-11-06 09:06:07 INFO     AutoFeaturizer: Featurizer type bandstructure not in the dataframe to be fitted. Skipping...
2020-11-06 09:06:07 INFO     AutoFeaturizer: Featurizer type dos not in the dataframe to be fitted. Skipping...
2020-11-06 09:06:07 INFO     AutoFeaturizer: Finished fitting.
2020-11-06 09:06:07 INFO     AutoFeaturizer: Starting transforming.
2020-11-06 09:06:07 INFO     AutoFeaturizer: Featurizing with ElementProperty.
2020-11-06 09:06:22 INFO     AutoFeaturizer: Featurizing with OxidationStates.
2020-11-06 09:06:23 INFO     AutoFeaturizer: Featurizing with ElectronAffinity.
2020-11-06 09:06:24 INFO     AutoFeaturizer: Featurizing with IonProperty.
2020-11-06 09:06:26 INFO     AutoFeaturizer: Featurizing with DensityFeatures.
2020-11-06 09:06:35 INFO     AutoFeaturizer: Featurizing with GlobalSymmetryFeatures.
2020-11-06 09:06:41 INFO     AutoFeaturizer: Featurizing with EwaldEnergy.
2020-11-06 09:07:03 INFO     AutoFeaturizer: Featurizing with SineCoulombMatrix.
2020-11-06 09:07:22 INFO     AutoFeaturizer: Featurizing with StructuralComplexity.
2020-11-06 09:07:32 INFO     AutoFeaturizer: Featurizer type bandstructure not in the dataframe. Skipping...
2020-11-06 09:07:32 INFO     AutoFeaturizer: Featurizer type dos not in the dataframe. Skipping...
2020-11-06 09:07:32 INFO     AutoFeaturizer: Finished transforming.
2020-11-06 09:07:32 INFO     DataCleaner: Starting fitting.
2020-11-06 09:07:32 INFO     DataCleaner: Cleaning with respect to samples with sample na_method 'drop'
2020-11-06 09:07:32 INFO     DataCleaner: Replacing infinite values with nan for easier screening.
2020-11-06 09:07:32 INFO     DataCleaner: One-hot encoding used for columns ['crystal_system']
2020-11-06 09:07:32 INFO     DataCleaner: Before handling na: 26474 samples, 231 features
2020-11-06 09:07:33 INFO     DataCleaner: 0 samples did not have target values. They were dropped.
2020-11-06 09:07:33 INFO     DataCleaner: Handling feature na by max na threshold of 0.01 with method 'drop'.
2020-11-06 09:07:33 INFO     DataCleaner: These 1 features were removed as they had more than 1.0% missing values: {'avg anion electron affinity'}
2020-11-06 09:07:33 INFO     DataCleaner: After handling na: 26474 samples, 230 features
2020-11-06 09:07:33 INFO     DataCleaner: Finished fitting.
2020-11-06 09:07:33 INFO     FeatureReducer: Starting fitting.
2020-11-06 09:07:35 INFO     FeatureReducer: 89 features removed due to cross correlation more than 0.95
2020-11-06 11:40:54 INFO     TreeFeatureReducer: Finished tree-based feature reduction of 140 initial features to 45
2020-11-06 11:40:54 INFO     FeatureReducer: Finished fitting.
2020-11-06 11:40:54 INFO     FeatureReducer: Starting transforming.
2020-11-06 11:40:54 INFO     FeatureReducer: Finished transforming.
2020-11-06 11:40:54 INFO     TPOTAdaptor: Starting fitting.
2020-11-06 13:06:13 INFO     TPOTAdaptor: Finished fitting.
2020-11-06 13:06:13 INFO     MatPipe successfully fit.
2020-11-06 13:06:13 INFO     Beginning MatPipe prediction using fitted pipeline.
2020-11-06 13:06:13 INFO     AutoFeaturizer: Starting transforming.
2020-11-06 13:06:13 INFO     AutoFeaturizer: Adding compositions from structures.
2020-11-06 13:06:13 INFO     AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 13:06:19 INFO     AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
2020-11-06 13:06:31 INFO     AutoFeaturizer: Featurizing with ElementProperty.
2020-11-06 13:06:33 INFO     AutoFeaturizer: Featurizing with OxidationStates.
2020-11-06 13:06:34 INFO     AutoFeaturizer: Featurizing with ElectronAffinity.
2020-11-06 13:06:35 INFO     AutoFeaturizer: Featurizing with IonProperty.
2020-11-06 13:06:36 INFO     AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 13:06:37 INFO     AutoFeaturizer: Featurizing with DensityFeatures.
2020-11-06 13:06:39 INFO     AutoFeaturizer: Featurizing with GlobalSymmetryFeatures.
2020-11-06 13:06:40 INFO     AutoFeaturizer: Featurizing with EwaldEnergy.
2020-11-06 13:06:43 INFO     AutoFeaturizer: Featurizing with SineCoulombMatrix.

As it just gets stuck, I have no error message that could be helpful. It’s interesting that the code had no problem carrying out the SineCoulomb Matrix featurization on a previous iteration (took around 20mins).

Best,
Peter

PS: I also ran into the same issue as described here: Error: Found array with 0 feature(s)

PPS: Here is my pip freeze

ase==3.19.1
automatminer==1.0.3.20200727
certifi==2020.4.5.2
chardet==3.0.4
cycler==0.10.0
dataclasses==0.7
deap==1.3.1
decorator==4.4.2
future==0.18.2
idna==2.9
importlib-metadata==2.0.0
importlib-resources==3.3.0
joblib==0.15.1
kiwisolver==1.2.0
matminer==0.6.2
matplotlib==3.2.2
mkl-fft==1.1.0
mkl-random==1.1.1
mkl-service==2.3.0
monty==3.0.2
mpmath==1.1.0
networkx==2.4
numpy==1.18.5
packaging==20.4
palettable==3.3.0
pandas==1.0.4
Pint==0.16.1
plotly==4.8.1
PyDispatcher==2.0.5
pymatgen==2020.1.28
pymongo==3.11.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
PyYAML==5.1.2
requests==2.24.0
retrying==1.3.3
ruamel.yaml==0.16.10
ruamel.yaml.clib==0.2.0
scikit-learn==0.22.2
scipy==1.4.1
SISSOkit==0.2.2
six==1.15.0
skrebate==0.6
spglib==1.15.1
stopit==1.1.2
sympy==1.6
tabulate==0.8.7
TPOT==0.11.0
tqdm==4.51.0
update-checker==0.18.0
urllib3==1.25.9
zipp==3.4.0
1 Like