pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
Yes, that is the way it should work! It will also apply the same feature reduction techniques and will automatically format the other_df so you can use it with the model.
In other words, once you “fit” a MatPipe on a set of data, you can use it to apply the same set of operations (featurization, feature reduction, data cleaning, learning) to any other_df you have (provided it has the same general format). It should require no intervention from the user.
Let me know if you have any issues doing this with your dataframe, as we are still in the experimental stage of this project!
Thanks,
Alex
···
On Thursday, December 20, 2018 at 10:36:26 AM UTC-8, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
I attached a subset of a 24K dataset and the corresponding output. When I run the code below, I get a cv value of -700 something, which doesn’t make sense to me. The pipe.post_fit_df doesn’t look quite right to me either. I was wondering if it could be do to the fractional numbers for some of the chemical formulas… Any thoughts on what is going on and what I can do about it? Thank you!! I hope that you have a Happy Holiday season!!
Fit a pipeline to training data to predict band gap
pipe = MatPipe()
pipe.fit(df, target)
2018-12-21 15:46:17 INFO Fitting MatPipe pipeline to data.
2018-12-21 15:46:17 INFO Running metaselector.
2018-12-21 15:46:17 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Replacing infinite values with nan for easier screening.
2018-12-21 15:46:17 INFO One-hot encoding used for columns [‘material’]
2018-12-21 15:46:24 INFO Before handling na: 21263 samples, 15543 features
2018-12-21 15:46:25 INFO 0 samples did not have target values. They were dropped.
2018-12-21 15:46:25 INFO Handling na by max na threshold of 0.01.
2018-12-21 15:47:01 INFO After handling na: 21263 samples, 15543 features
On Thursday, December 20, 2018 at 2:16:26 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!! Will do!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
On Tuesday, December 25, 2018 at 9:21:43 AM UTC-5, Alex Dunn wrote:
Hi Thomas,
If you are referencing the internal CV score of matpipe (which you might be seeing through .digest), this is because it is given as negative MSE
Thanks,
Alex
On Mon, Dec 24, 2018 at 4:44 PM thomas heiman [email protected] wrote:
Hi Alex,
I attached a subset of a 24K dataset and the corresponding output. When I run the code below, I get a cv value of -700 something, which doesn’t make sense to me. The pipe.post_fit_df doesn’t look quite right to me either. I was wondering if it could be do to the fractional numbers for some of the chemical formulas… Any thoughts on what is going on and what I can do about it? Thank you!! I hope that you have a Happy Holiday season!!
Fit a pipeline to training data to predict band gap
pipe = MatPipe()
pipe.fit(df, target)
2018-12-21 15:46:17 INFO Fitting MatPipe pipeline to data.
2018-12-21 15:46:17 INFO Running metaselector.
2018-12-21 15:46:17 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Replacing infinite values with nan for easier screening.
2018-12-21 15:46:17 INFO One-hot encoding used for columns [‘material’]
2018-12-21 15:46:24 INFO Before handling na: 21263 samples, 15543 features
2018-12-21 15:46:25 INFO 0 samples did not have target values. They were dropped.
2018-12-21 15:46:25 INFO Handling na by max na threshold of 0.01.
2018-12-21 15:47:01 INFO After handling na: 21263 samples, 15543 features
On Thursday, December 20, 2018 at 2:16:26 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!! Will do!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
Sincerely, tom
–
You received this message because you are subscribed to the Google Groups “matminer” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
Yes I have seen it! They have some excellent work there. We plan on eventually incorporating the ability to use some of their software with matminer and automatminer actually.
If you are interested in featurizing a dataframe, just use the AutoFeaturizer class.
from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df = af.fit_transform(df, “your_target_property”)
If you also want to automatically filter the nans and select only the most important features, I’d recommend also using the DataCleaner and FeatureReducer classes:
dc = DataCleaner()
fr = FeatureReducer()
df = dc.fit_transform(df, “your_target_property”)
df = fr.fit_transform(df, “your_target_property”)
For all 3 of the classes I described, you can give init arguments relevant to the specifics of your problem.
Thanks,
Alex
···
On Tuesday, December 25, 2018 at 9:03:19 AM UTC-8, thomas heiman wrote:
On Tuesday, December 25, 2018 at 9:21:43 AM UTC-5, Alex Dunn wrote:
Hi Thomas,
If you are referencing the internal CV score of matpipe (which you might be seeing through .digest), this is because it is given as negative MSE
Thanks,
Alex
On Mon, Dec 24, 2018 at 4:44 PM thomas heiman [email protected] wrote:
Hi Alex,
I attached a subset of a 24K dataset and the corresponding output. When I run the code below, I get a cv value of -700 something, which doesn’t make sense to me. The pipe.post_fit_df doesn’t look quite right to me either. I was wondering if it could be do to the fractional numbers for some of the chemical formulas… Any thoughts on what is going on and what I can do about it? Thank you!! I hope that you have a Happy Holiday season!!
Fit a pipeline to training data to predict band gap
pipe = MatPipe()
pipe.fit(df, target)
2018-12-21 15:46:17 INFO Fitting MatPipe pipeline to data.
2018-12-21 15:46:17 INFO Running metaselector.
2018-12-21 15:46:17 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Replacing infinite values with nan for easier screening.
2018-12-21 15:46:17 INFO One-hot encoding used for columns [‘material’]
2018-12-21 15:46:24 INFO Before handling na: 21263 samples, 15543 features
2018-12-21 15:46:25 INFO 0 samples did not have target values. They were dropped.
2018-12-21 15:46:25 INFO Handling na by max na threshold of 0.01.
2018-12-21 15:47:01 INFO After handling na: 21263 samples, 15543 features
On Thursday, December 20, 2018 at 2:16:26 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!! Will do!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
Sincerely, tom
–
You received this message because you are subscribed to the Google Groups “matminer” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!
from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)
2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!!!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.
The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.
···
On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:
Hi Alex,
I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!
from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)
2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!!!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
Maybe it would be clearer if composition_col was a required argument into AutoFeaturizer init? Sometimes trying to be helpful by setting a default col name like “composition” just makes things more confusing.
Actually looking at the code, I can see why the composition_col (and other ones like structure_col) are set with default values.
As I expect other people to make the same mistake, probably the best thing is to throw a more informative exception if none of the expected column names are present in the data frame (e.g., tell the user what to do to fix it)?
···
On Wednesday, December 26, 2018 at 3:18:05 PM UTC-8, Anubhav Jain wrote:
Maybe it would be clearer if composition_col was a required argument into AutoFeaturizer init? Sometimes trying to be helpful by setting a default col name like “composition” just makes things more confusing.
You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.
The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.
On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:
Hi Alex,
I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!
from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)
2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!!!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
Sincerely, tom
–
You received this message because you are subscribed to the Google Groups “matminer” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
Also - for the negative CV score, can the attribute name be designed in such a way to make it clear that this is a negative CV score?
Not sure what it’s called now, but something like neg_score or something?
···
On Wednesday, December 26, 2018 at 3:32:13 PM UTC-8, Anubhav Jain wrote:
Actually looking at the code, I can see why the composition_col (and other ones like structure_col) are set with default values.
As I expect other people to make the same mistake, probably the best thing is to throw a more informative exception if none of the expected column names are present in the data frame (e.g., tell the user what to do to fix it)?
On Wednesday, December 26, 2018 at 3:18:05 PM UTC-8, Anubhav Jain wrote:
Maybe it would be clearer if composition_col was a required argument into AutoFeaturizer init? Sometimes trying to be helpful by setting a default col name like “composition” just makes things more confusing.
You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.
The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.
On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:
Hi Alex,
I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!
from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)
2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!!!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!
Sincerely, tom
–
You received this message because you are subscribed to the Google Groups “matminer” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
On Wednesday, December 26, 2018 at 3:43:27 PM UTC-5, [email protected] wrote:
Hi thomas,
You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.
The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.
On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:
Hi Alex,
I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!
from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)
2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:
Hi Alex,
Thank you!!!
Sincerely,
tom
On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:
Hi,
Say I have built a model using:
pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!