I attached a subset of a 24K dataset and the corresponding output. When I run the code below, I get a cv value of -700 something, which doesn’t make sense to me. The pipe.post_fit_df doesn’t look quite right to me either. I was wondering if it could be do to the fractional numbers for some of the chemical formulas… Any thoughts on what is going on and what I can do about it? Thank you!! I hope that you have a Happy Holiday season!!
import pandas as pd
import numpy as np
df = pd.read_csv(‘Documents/Superconductor24K.csv’)
target = ‘critical_temp’
RS = 29
timelimitmins = 180
print('timelimitmins = ', timelimitmins)
model_type = ‘regression’
scoring = ‘r2’
from automatminer.pipeline import MatPipe
Fit a pipeline to training data to predict band gap
pipe = MatPipe()
2018-12-21 15:46:17 INFO Fitting MatPipe pipeline to data.
2018-12-21 15:46:17 INFO Running metaselector.
2018-12-21 15:46:17 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Replacing infinite values with nan for easier screening.
2018-12-21 15:46:17 INFO One-hot encoding used for columns [‘material’]
2018-12-21 15:46:24 INFO Before handling na: 21263 samples, 15543 features
2018-12-21 15:46:25 INFO 0 samples did not have target values. They were dropped.
2018-12-21 15:46:25 INFO Handling na by max na threshold of 0.01.
2018-12-21 15:47:01 INFO After handling na: 21263 samples, 15543 features
pipe.post_fit_df.to_csv(‘PipePostFit.csv’, sep=’\t’, encoding=‘utf-8’)
subset_25k.xlsx (8.62 KB)
subset_PipePostFit.xlsx (8.66 KB)
On Thursday, December 20, 2018 at 2:16:26 PM UTC-5, thomas heiman wrote:
Thank you!! Will do!
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Say I have built a model using:
pipe = MatPipe()
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!