decorating test data with discriptors

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

Hey tom,

Yes, that is the way it should work! It will also apply the same feature reduction techniques and will automatically format the other_df so you can use it with the model.

In other words, once you “fit” a MatPipe on a set of data, you can use it to apply the same set of operations (featurization, feature reduction, data cleaning, learning) to any other_df you have (provided it has the same general format). It should require no intervention from the user.

Let me know if you have any issues doing this with your dataframe, as we are still in the experimental stage of this project!

Thanks,

Alex

···

On Thursday, December 20, 2018 at 10:36:26 AM UTC-8, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

Hi Alex,

Thank you!! Will do!

Sincerely,

tom

···

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

Hi Alex,

I attached a subset of a 24K dataset and the corresponding output. When I run the code below, I get a cv value of -700 something, which doesn’t make sense to me. The pipe.post_fit_df doesn’t look quite right to me either. I was wondering if it could be do to the fractional numbers for some of the chemical formulas… Any thoughts on what is going on and what I can do about it? Thank you!! I hope that you have a Happy Holiday season!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

#user inputs

target = ‘critical_temp’

RS = 29

timelimitmins = 180

print('timelimitmins = ', timelimitmins)

model_type = ‘regression’

scoring = ‘r2’

from automatminer.pipeline import MatPipe

Fit a pipeline to training data to predict band gap

pipe = MatPipe()

pipe.fit(df, target)

2018-12-21 15:46:17 INFO Fitting MatPipe pipeline to data.
2018-12-21 15:46:17 INFO Running metaselector.
2018-12-21 15:46:17 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Replacing infinite values with nan for easier screening.
2018-12-21 15:46:17 INFO One-hot encoding used for columns [‘material’]
2018-12-21 15:46:24 INFO Before handling na: 21263 samples, 15543 features
2018-12-21 15:46:25 INFO 0 samples did not have target values. They were dropped.
2018-12-21 15:46:25 INFO Handling na by max na threshold of 0.01.
2018-12-21 15:47:01 INFO After handling na: 21263 samples, 15543 features

pipe.digest()

display(pipe.post_fit_df)

pipe.post_fit_df.to_csv(‘PipePostFit.csv’, sep=’\t’, encoding=‘utf-8’)

subset_25k.xlsx (8.62 KB)

subset_PipePostFit.xlsx (8.66 KB)

···

On Thursday, December 20, 2018 at 2:16:26 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!! Will do!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

Hi Thomas,

If you are referencing the internal CV score of matpipe (which you might be seeing through .digest), this is because it is given as negative MSE

Thanks,

Alex

···

Alex Dunn

Graduate Student

UC Berkeley Materials Science

[email protected]

Hi Alex,

Is there any way that I could see the decorated data frame before I fit a model to it? Have you seen this: https://github.com/materialsvirtuallab/megnet ? Thank you!!

Sincerely,

tom

···

On Tuesday, December 25, 2018 at 9:21:43 AM UTC-5, Alex Dunn wrote:

Hi Thomas,

If you are referencing the internal CV score of matpipe (which you might be seeing through .digest), this is because it is given as negative MSE

Thanks,

Alex

On Mon, Dec 24, 2018 at 4:44 PM thomas heiman [email protected] wrote:

Hi Alex,

I attached a subset of a 24K dataset and the corresponding output. When I run the code below, I get a cv value of -700 something, which doesn’t make sense to me. The pipe.post_fit_df doesn’t look quite right to me either. I was wondering if it could be do to the fractional numbers for some of the chemical formulas… Any thoughts on what is going on and what I can do about it? Thank you!! I hope that you have a Happy Holiday season!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

#user inputs

target = ‘critical_temp’

RS = 29

timelimitmins = 180

print('timelimitmins = ', timelimitmins)

model_type = ‘regression’

scoring = ‘r2’

from automatminer.pipeline import MatPipe

Fit a pipeline to training data to predict band gap

pipe = MatPipe()

pipe.fit(df, target)

2018-12-21 15:46:17 INFO Fitting MatPipe pipeline to data.
2018-12-21 15:46:17 INFO Running metaselector.
2018-12-21 15:46:17 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Replacing infinite values with nan for easier screening.
2018-12-21 15:46:17 INFO One-hot encoding used for columns [‘material’]
2018-12-21 15:46:24 INFO Before handling na: 21263 samples, 15543 features
2018-12-21 15:46:25 INFO 0 samples did not have target values. They were dropped.
2018-12-21 15:46:25 INFO Handling na by max na threshold of 0.01.
2018-12-21 15:47:01 INFO After handling na: 21263 samples, 15543 features

pipe.digest()

display(pipe.post_fit_df)

pipe.post_fit_df.to_csv(‘PipePostFit.csv’, sep=’\t’, encoding=‘utf-8’)

On Thursday, December 20, 2018 at 2:16:26 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!! Will do!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

You received this message because you are subscribed to the Google Groups “matminer” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

For more options, visit https://groups.google.com/d/optout.


Alex Dunn

Graduate Student

UC Berkeley Materials Science

[email protected]

Hi Thomas,

Yes I have seen it! They have some excellent work there. We plan on eventually incorporating the ability to use some of their software with matminer and automatminer actually.

If you are interested in featurizing a dataframe, just use the AutoFeaturizer class.

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()

df = af.fit_transform(df, “your_target_property”)

If you also want to automatically filter the nans and select only the most important features, I’d recommend also using the DataCleaner and FeatureReducer classes:

dc = DataCleaner()

fr = FeatureReducer()

df = dc.fit_transform(df, “your_target_property”)

df = fr.fit_transform(df, “your_target_property”)

For all 3 of the classes I described, you can give init arguments relevant to the specifics of your problem.

Thanks,

Alex

···

On Tuesday, December 25, 2018 at 9:03:19 AM UTC-8, thomas heiman wrote:

Hi Alex,

Is there any way that I could see the decorated data frame before I fit a model to it? Have you seen this: https://github.com/materialsvirtuallab/megnet ? Thank you!!

Sincerely,

tom

On Tuesday, December 25, 2018 at 9:21:43 AM UTC-5, Alex Dunn wrote:

Hi Thomas,

If you are referencing the internal CV score of matpipe (which you might be seeing through .digest), this is because it is given as negative MSE

Thanks,

Alex

On Mon, Dec 24, 2018 at 4:44 PM thomas heiman [email protected] wrote:

Hi Alex,

I attached a subset of a 24K dataset and the corresponding output. When I run the code below, I get a cv value of -700 something, which doesn’t make sense to me. The pipe.post_fit_df doesn’t look quite right to me either. I was wondering if it could be do to the fractional numbers for some of the chemical formulas… Any thoughts on what is going on and what I can do about it? Thank you!! I hope that you have a Happy Holiday season!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

#user inputs

target = ‘critical_temp’

RS = 29

timelimitmins = 180

print('timelimitmins = ', timelimitmins)

model_type = ‘regression’

scoring = ‘r2’

from automatminer.pipeline import MatPipe

Fit a pipeline to training data to predict band gap

pipe = MatPipe()

pipe.fit(df, target)

2018-12-21 15:46:17 INFO Fitting MatPipe pipeline to data.
2018-12-21 15:46:17 INFO Running metaselector.
2018-12-21 15:46:17 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…
2018-12-21 15:46:17 INFO Replacing infinite values with nan for easier screening.
2018-12-21 15:46:17 INFO One-hot encoding used for columns [‘material’]
2018-12-21 15:46:24 INFO Before handling na: 21263 samples, 15543 features
2018-12-21 15:46:25 INFO 0 samples did not have target values. They were dropped.
2018-12-21 15:46:25 INFO Handling na by max na threshold of 0.01.
2018-12-21 15:47:01 INFO After handling na: 21263 samples, 15543 features

pipe.digest()

display(pipe.post_fit_df)

pipe.post_fit_df.to_csv(‘PipePostFit.csv’, sep=’\t’, encoding=‘utf-8’)

On Thursday, December 20, 2018 at 2:16:26 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!! Will do!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

You received this message because you are subscribed to the Google Groups “matminer” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

For more options, visit https://groups.google.com/d/optout.


Alex Dunn

Graduate Student

UC Berkeley Materials Science

[email protected]

Hi Alex,

Thank you!!!

Sincerely,

tom

···

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

Hi Alex,

I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…

display(df1)

material
critical_temp
0
Ba0.2La1.8Cu1O4
29.00
1
Ba0.1La1.9Ag0.1Cu0.9O4
26.00
2
Ba0.1La1.9Cu1O4
19.00
3
Ba0.15La1.85Cu1O4
22.00
4
Ba0.3La1.7Cu1O4
23.00
5
Ba0.5La1.5Cu1O4
23.00
6
Ba1La1Cu1O4
11.00

···

On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!!!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

Hi thomas,

You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.

The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.

···

On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:

Hi Alex,

I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…

display(df1)

material
critical_temp
0
Ba0.2La1.8Cu1O4
29.00
1
Ba0.1La1.9Ag0.1Cu0.9O4
26.00
2
Ba0.1La1.9Cu1O4
19.00
3
Ba0.15La1.85Cu1O4
22.00
4
Ba0.3La1.7Cu1O4
23.00
5
Ba0.5La1.5Cu1O4
23.00
6
Ba1La1Cu1O4
11.00


On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!!!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

Maybe it would be clearer if composition_col was a required argument into AutoFeaturizer init? Sometimes trying to be helpful by setting a default col name like “composition” just makes things more confusing.

···

Best,
Anubhav

Actually looking at the code, I can see why the composition_col (and other ones like structure_col) are set with default values.

As I expect other people to make the same mistake, probably the best thing is to throw a more informative exception if none of the expected column names are present in the data frame (e.g., tell the user what to do to fix it)?

···

On Wednesday, December 26, 2018 at 3:18:05 PM UTC-8, Anubhav Jain wrote:

Maybe it would be clearer if composition_col was a required argument into AutoFeaturizer init? Sometimes trying to be helpful by setting a default col name like “composition” just makes things more confusing.

On Wed, Dec 26, 2018 at 12:43 PM [email protected] wrote:

Hi thomas,

You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.

The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.

On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:

Hi Alex,

I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…

display(df1)

material
critical_temp
0
Ba0.2La1.8Cu1O4
29.00
1
Ba0.1La1.9Ag0.1Cu0.9O4
26.00
2
Ba0.1La1.9Cu1O4
19.00
3
Ba0.15La1.85Cu1O4
22.00
4
Ba0.3La1.7Cu1O4
23.00
5
Ba0.5La1.5Cu1O4
23.00
6
Ba1La1Cu1O4
11.00


On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!!!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

You received this message because you are subscribed to the Google Groups “matminer” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

For more options, visit https://groups.google.com/d/optout.


Best,
Anubhav

Also - for the negative CV score, can the attribute name be designed in such a way to make it clear that this is a negative CV score?

Not sure what it’s called now, but something like neg_score or something?

···

On Wednesday, December 26, 2018 at 3:32:13 PM UTC-8, Anubhav Jain wrote:

Actually looking at the code, I can see why the composition_col (and other ones like structure_col) are set with default values.

As I expect other people to make the same mistake, probably the best thing is to throw a more informative exception if none of the expected column names are present in the data frame (e.g., tell the user what to do to fix it)?

On Wednesday, December 26, 2018 at 3:18:05 PM UTC-8, Anubhav Jain wrote:

Maybe it would be clearer if composition_col was a required argument into AutoFeaturizer init? Sometimes trying to be helpful by setting a default col name like “composition” just makes things more confusing.

On Wed, Dec 26, 2018 at 12:43 PM [email protected] wrote:

Hi thomas,

You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.

The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.

On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:

Hi Alex,

I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…

display(df1)

material
critical_temp
0
Ba0.2La1.8Cu1O4
29.00
1
Ba0.1La1.9Ag0.1Cu0.9O4
26.00
2
Ba0.1La1.9Cu1O4
19.00
3
Ba0.15La1.85Cu1O4
22.00
4
Ba0.3La1.7Cu1O4
23.00
5
Ba0.5La1.5Cu1O4
23.00
6
Ba1La1Cu1O4
11.00


On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!!!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom

You received this message because you are subscribed to the Google Groups “matminer” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

For more options, visit https://groups.google.com/d/optout.


Best,
Anubhav

Hi Alex,

Thank you!!

Sincerely,

tom

···

On Wednesday, December 26, 2018 at 3:43:27 PM UTC-5, [email protected] wrote:

Hi thomas,

You need to specify the column you want to featurize, otherwise autofeaturizer has no way of knowing which column you want to use. You are featurizing compositions. Therefore set composition_col=“material” in AutoFeaturizer when you are creating the class.

The default for composition_col is just “composition”, so you could also just rename your dataframe “material” column to “composition”.

On Wednesday, December 26, 2018 at 12:40:39 PM UTC-8, thomas heiman wrote:

Hi Alex,

I ran the code below and it is not featurizing the dataset… It is returning the same dataframe as I started with… Any ideas on why its not recognizing the chemical formulas? Thank you!!

Sincerely,

tom

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

2018-12-26 15:35:07 INFO Running metaselector.
2018-12-26 15:35:07 INFO Featurizer type composition not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type structure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type bandstructure not in the dataframeto be fitted. Skipping…
2018-12-26 15:35:07 INFO Featurizer type dos not in the dataframeto be fitted. Skipping…

display(df1)

material
critical_temp
0
Ba0.2La1.8Cu1O4
29.00
1
Ba0.1La1.9Ag0.1Cu0.9O4
26.00
2
Ba0.1La1.9Cu1O4
19.00
3
Ba0.15La1.85Cu1O4
22.00
4
Ba0.3La1.7Cu1O4
23.00
5
Ba0.5La1.5Cu1O4
23.00
6
Ba1La1Cu1O4
11.00


On Tuesday, December 25, 2018 at 6:49:28 PM UTC-5, thomas heiman wrote:

Hi Alex,

Thank you!!!

Sincerely,

tom

On Thursday, December 20, 2018 at 1:36:26 PM UTC-5, thomas heiman wrote:

Hi,

Say I have built a model using:

pipe = MatPipe()
pipe.fit(train_df, “Tc”
)
and then I want to predict the properties of some other data using:
predicted_df = pipe.predict(other_df, “Tc”
)
Will the pipe.predict decorate the other_df with descriptors or do I have to do that separately? If so, how would I do that? Thank you!

Sincerely, tom