Hello Matminers!
I’ve tried out Automatminer (1.0.3.20200727) on my data using the ‘express’ setting and on one of the iterations, it got stuck at the SineCoulomb matrix featurizing. Below is the log:
2020-11-06 09:04:45 INFO Problem type is: regression
2020-11-06 09:04:45 INFO Fitting MatPipe pipeline to data.
2020-11-06 09:04:45 INFO AutoFeaturizer: Starting fitting.
2020-11-06 09:04:45 INFO AutoFeaturizer: Adding compositions from structures.
2020-11-06 09:04:45 INFO AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 09:05:31 INFO AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
2020-11-06 09:05:35 INFO AutoFeaturizer: Will remove YangSolidSolution because it's fraction passing the precheck for this dataset (0.758291153584649) was less than the minimum (0.9)
2020-11-06 09:05:37 INFO AutoFeaturizer: Will remove Miedema because it's fraction passing the precheck for this dataset (0.758291153584649) was less than the minimum (0.9)
2020-11-06 09:05:37 INFO AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 09:06:07 INFO AutoFeaturizer: Will remove GlobalInstabilityIndex because it's fraction passing the precheck for this dataset (0.13738007101306943) was less than the minimum (0.9)
2020-11-06 09:06:07 INFO AutoFeaturizer: Featurizer type bandstructure not in the dataframe to be fitted. Skipping...
2020-11-06 09:06:07 INFO AutoFeaturizer: Featurizer type dos not in the dataframe to be fitted. Skipping...
2020-11-06 09:06:07 INFO AutoFeaturizer: Finished fitting.
2020-11-06 09:06:07 INFO AutoFeaturizer: Starting transforming.
2020-11-06 09:06:07 INFO AutoFeaturizer: Featurizing with ElementProperty.
2020-11-06 09:06:22 INFO AutoFeaturizer: Featurizing with OxidationStates.
2020-11-06 09:06:23 INFO AutoFeaturizer: Featurizing with ElectronAffinity.
2020-11-06 09:06:24 INFO AutoFeaturizer: Featurizing with IonProperty.
2020-11-06 09:06:26 INFO AutoFeaturizer: Featurizing with DensityFeatures.
2020-11-06 09:06:35 INFO AutoFeaturizer: Featurizing with GlobalSymmetryFeatures.
2020-11-06 09:06:41 INFO AutoFeaturizer: Featurizing with EwaldEnergy.
2020-11-06 09:07:03 INFO AutoFeaturizer: Featurizing with SineCoulombMatrix.
2020-11-06 09:07:22 INFO AutoFeaturizer: Featurizing with StructuralComplexity.
2020-11-06 09:07:32 INFO AutoFeaturizer: Featurizer type bandstructure not in the dataframe. Skipping...
2020-11-06 09:07:32 INFO AutoFeaturizer: Featurizer type dos not in the dataframe. Skipping...
2020-11-06 09:07:32 INFO AutoFeaturizer: Finished transforming.
2020-11-06 09:07:32 INFO DataCleaner: Starting fitting.
2020-11-06 09:07:32 INFO DataCleaner: Cleaning with respect to samples with sample na_method 'drop'
2020-11-06 09:07:32 INFO DataCleaner: Replacing infinite values with nan for easier screening.
2020-11-06 09:07:32 INFO DataCleaner: One-hot encoding used for columns ['crystal_system']
2020-11-06 09:07:32 INFO DataCleaner: Before handling na: 26474 samples, 231 features
2020-11-06 09:07:33 INFO DataCleaner: 0 samples did not have target values. They were dropped.
2020-11-06 09:07:33 INFO DataCleaner: Handling feature na by max na threshold of 0.01 with method 'drop'.
2020-11-06 09:07:33 INFO DataCleaner: These 1 features were removed as they had more than 1.0% missing values: {'avg anion electron affinity'}
2020-11-06 09:07:33 INFO DataCleaner: After handling na: 26474 samples, 230 features
2020-11-06 09:07:33 INFO DataCleaner: Finished fitting.
2020-11-06 09:07:33 INFO FeatureReducer: Starting fitting.
2020-11-06 09:07:35 INFO FeatureReducer: 89 features removed due to cross correlation more than 0.95
2020-11-06 11:40:54 INFO TreeFeatureReducer: Finished tree-based feature reduction of 140 initial features to 45
2020-11-06 11:40:54 INFO FeatureReducer: Finished fitting.
2020-11-06 11:40:54 INFO FeatureReducer: Starting transforming.
2020-11-06 11:40:54 INFO FeatureReducer: Finished transforming.
2020-11-06 11:40:54 INFO TPOTAdaptor: Starting fitting.
2020-11-06 13:06:13 INFO TPOTAdaptor: Finished fitting.
2020-11-06 13:06:13 INFO MatPipe successfully fit.
2020-11-06 13:06:13 INFO Beginning MatPipe prediction using fitted pipeline.
2020-11-06 13:06:13 INFO AutoFeaturizer: Starting transforming.
2020-11-06 13:06:13 INFO AutoFeaturizer: Adding compositions from structures.
2020-11-06 13:06:13 INFO AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 13:06:19 INFO AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.
2020-11-06 13:06:31 INFO AutoFeaturizer: Featurizing with ElementProperty.
2020-11-06 13:06:33 INFO AutoFeaturizer: Featurizing with OxidationStates.
2020-11-06 13:06:34 INFO AutoFeaturizer: Featurizing with ElectronAffinity.
2020-11-06 13:06:35 INFO AutoFeaturizer: Featurizing with IonProperty.
2020-11-06 13:06:36 INFO AutoFeaturizer: Guessing oxidation states of structures if they were not present in input.
2020-11-06 13:06:37 INFO AutoFeaturizer: Featurizing with DensityFeatures.
2020-11-06 13:06:39 INFO AutoFeaturizer: Featurizing with GlobalSymmetryFeatures.
2020-11-06 13:06:40 INFO AutoFeaturizer: Featurizing with EwaldEnergy.
2020-11-06 13:06:43 INFO AutoFeaturizer: Featurizing with SineCoulombMatrix.
As it just gets stuck, I have no error message that could be helpful. It’s interesting that the code had no problem carrying out the SineCoulomb Matrix featurization on a previous iteration (took around 20mins).
Best,
Peter
PS: I also ran into the same issue as described here: Error: Found array with 0 feature(s)
PPS: Here is my pip freeze
ase==3.19.1
automatminer==1.0.3.20200727
certifi==2020.4.5.2
chardet==3.0.4
cycler==0.10.0
dataclasses==0.7
deap==1.3.1
decorator==4.4.2
future==0.18.2
idna==2.9
importlib-metadata==2.0.0
importlib-resources==3.3.0
joblib==0.15.1
kiwisolver==1.2.0
matminer==0.6.2
matplotlib==3.2.2
mkl-fft==1.1.0
mkl-random==1.1.1
mkl-service==2.3.0
monty==3.0.2
mpmath==1.1.0
networkx==2.4
numpy==1.18.5
packaging==20.4
palettable==3.3.0
pandas==1.0.4
Pint==0.16.1
plotly==4.8.1
PyDispatcher==2.0.5
pymatgen==2020.1.28
pymongo==3.11.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
PyYAML==5.1.2
requests==2.24.0
retrying==1.3.3
ruamel.yaml==0.16.10
ruamel.yaml.clib==0.2.0
scikit-learn==0.22.2
scipy==1.4.1
SISSOkit==0.2.2
six==1.15.0
skrebate==0.6
spglib==1.15.1
stopit==1.1.2
sympy==1.6
tabulate==0.8.7
TPOT==0.11.0
tqdm==4.51.0
update-checker==0.18.0
urllib3==1.25.9
zipp==3.4.0