chemical formulas with fractional components

thomas_heiman · January 22, 2019, 5:36pm

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

Anubhav_Jain · February 8, 2019, 10:30pm

I believe the AutoFeaturizer uses the oxi_state_guesses() routine which only works for integer oxidation states

The two best solutions are likely:

Convert the compositions to integer compositions using pymatgen’s “get_integer_formula_and_factor()” function. This could probably be a built in conversion featurizer in matminer. After the conversion you could run the code as normal
Try a different oxidation state decoration routine. You could either add the oxidation states manually to the compositions, or if you have a structure, you could try the BVAnalyzer

···

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

ardunn · February 9, 2019, 12:05am

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

···

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

thomas_heiman · February 10, 2019, 1:37am

Hi Alex,

I am trying to run the following code using the newest Matminer and Automatminer using data the has two fields (1)composition and (2) Tc:

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

I get the error below… I am trying to get as many features as I can and use a genetic programming based symbolic regression approach to see what I can learn… Any suggestions for a simple example would be greatly appreciated!! When I used pipe.fit(df, “critical_temp”), it looked like it generated the oxidation states but for the final data set it seemed to eliminate them… So I don’t know yet if it worked for my data set… Once again I appreciate any help!!

Sincerely,

tom

AutomatminerError Traceback (most recent call last)
in ()
1 from automatminer.featurization import AutoFeaturizer
----> 2 af = AutoFeaturizer()
3 df1 = af.fit_transform(df, “critical_temp”)

~\Anaconda3\lib\site-packages\automatminer\featurization\core.py in init(self, preset, featurizers, exclude, functionalize, max_na_frac, ignore_cols, ignore_errors, drop_inputs, guess_oxistates, multiindex, n_jobs, logger, composition_col, structure_col, bandstructure_col, dos_col)
93 " ‘fast’) or set featurizers manually.")
94 if not featurizers and not preset:
—> 95 raise AutomatminerError("Please specify set(s) of featurizers to "
96 “use either through the featurizers”
97 “argument or through the preset argument.”)

AutomatminerError

: AutomatminerError : Please specify set(s) of featurizers to use either through the featurizersargument or through the preset argument.

···

On Friday, February 8, 2019 at 7:05:03 PM UTC-5, [email protected] wrote:

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

thomas_heiman · February 10, 2019, 1:55pm

Hi Alex,

I figured it out.

Sincerely,

tom

···

On Saturday, February 9, 2019 at 8:37:11 PM UTC-5, thomas heiman wrote:

Hi Alex,

I am trying to run the following code using the newest Matminer and Automatminer using data the has two fields (1)composition and (2) Tc:

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

I get the error below… I am trying to get as many features as I can and use a genetic programming based symbolic regression approach to see what I can learn… Any suggestions for a simple example would be greatly appreciated!! When I used pipe.fit(df, “critical_temp”), it looked like it generated the oxidation states but for the final data set it seemed to eliminate them… So I don’t know yet if it worked for my data set… Once again I appreciate any help!!

Sincerely,

tom

AutomatminerError Traceback (most recent call last)
in ()
1 from automatminer.featurization import AutoFeaturizer
----> 2 af = AutoFeaturizer()
3 df1 = af.fit_transform(df, “critical_temp”)

~\Anaconda3\lib\site-packages\automatminer\featurization\core.py in init(self, preset, featurizers, exclude, functionalize, max_na_frac, ignore_cols, ignore_errors, drop_inputs, guess_oxistates, multiindex, n_jobs, logger, composition_col, structure_col, bandstructure_col, dos_col)
93 " ‘fast’) or set featurizers manually.")
94 if not featurizers and not preset:
—> 95 raise AutomatminerError("Please specify set(s) of featurizers to "
96 “use either through the featurizers”
97 “argument or through the preset argument.”)

AutomatminerError

: AutomatminerError : Please specify set(s) of featurizers to use either through the featurizersargument or through the preset argument.

On Friday, February 8, 2019 at 7:05:03 PM UTC-5, [email protected] wrote:

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

ardunn · February 11, 2019, 9:14pm

Good to hear!

···

On Sunday, February 10, 2019 at 5:55:33 AM UTC-8, thomas heiman wrote:

Hi Alex,

I figured it out.

Sincerely,

tom

On Saturday, February 9, 2019 at 8:37:11 PM UTC-5, thomas heiman wrote:

Hi Alex,

I am trying to run the following code using the newest Matminer and Automatminer using data the has two fields (1)composition and (2) Tc:

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

I get the error below… I am trying to get as many features as I can and use a genetic programming based symbolic regression approach to see what I can learn… Any suggestions for a simple example would be greatly appreciated!! When I used pipe.fit(df, “critical_temp”), it looked like it generated the oxidation states but for the final data set it seemed to eliminate them… So I don’t know yet if it worked for my data set… Once again I appreciate any help!!

Sincerely,

tom

AutomatminerError Traceback (most recent call last)
in ()
1 from automatminer.featurization import AutoFeaturizer
----> 2 af = AutoFeaturizer()
3 df1 = af.fit_transform(df, “critical_temp”)

~\Anaconda3\lib\site-packages\automatminer\featurization\core.py in init(self, preset, featurizers, exclude, functionalize, max_na_frac, ignore_cols, ignore_errors, drop_inputs, guess_oxistates, multiindex, n_jobs, logger, composition_col, structure_col, bandstructure_col, dos_col)
93 " ‘fast’) or set featurizers manually.")
94 if not featurizers and not preset:
—> 95 raise AutomatminerError("Please specify set(s) of featurizers to "
96 “use either through the featurizers”
97 “argument or through the preset argument.”)

AutomatminerError

: AutomatminerError : Please specify set(s) of featurizers to use either through the featurizersargument or through the preset argument.

On Friday, February 8, 2019 at 7:05:03 PM UTC-5, [email protected] wrote:

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom