chemical formulas with fractional components

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

I believe the AutoFeaturizer uses the oxi_state_guesses() routine which only works for integer oxidation states

The two best solutions are likely:

  1. Convert the compositions to integer compositions using pymatgen’s “get_integer_formula_and_factor()” function. This could probably be a built in conversion featurizer in matminer. After the conversion you could run the code as normal

  2. Try a different oxidation state decoration routine. You could either add the oxidation states manually to the compositions, or if you have a structure, you could try the BVAnalyzer

···

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

···

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

Hi Alex,

I am trying to run the following code using the newest Matminer and Automatminer using data the has two fields (1)composition and (2) Tc:

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

I get the error below… I am trying to get as many features as I can and use a genetic programming based symbolic regression approach to see what I can learn… Any suggestions for a simple example would be greatly appreciated!! When I used pipe.fit(df, “critical_temp”), it looked like it generated the oxidation states but for the final data set it seemed to eliminate them… So I don’t know yet if it worked for my data set… Once again I appreciate any help!!

Sincerely,

tom

AutomatminerError Traceback (most recent call last)
in ()
1 from automatminer.featurization import AutoFeaturizer
----> 2 af = AutoFeaturizer()
3 df1 = af.fit_transform(df, “critical_temp”)

~\Anaconda3\lib\site-packages\automatminer\featurization\core.py in init(self, preset, featurizers, exclude, functionalize, max_na_frac, ignore_cols, ignore_errors, drop_inputs, guess_oxistates, multiindex, n_jobs, logger, composition_col, structure_col, bandstructure_col, dos_col)
93 " ‘fast’) or set featurizers manually.")
94 if not featurizers and not preset:
—> 95 raise AutomatminerError("Please specify set(s) of featurizers to "
96 “use either through the featurizers”
97 “argument or through the preset argument.”)

AutomatminerError

: AutomatminerError : Please specify set(s) of featurizers to use either through the featurizersargument or through the preset argument.

···

On Friday, February 8, 2019 at 7:05:03 PM UTC-5, [email protected] wrote:

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

Hi Alex,

I figured it out.

Sincerely,

tom

···

On Saturday, February 9, 2019 at 8:37:11 PM UTC-5, thomas heiman wrote:

Hi Alex,

I am trying to run the following code using the newest Matminer and Automatminer using data the has two fields (1)composition and (2) Tc:

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

I get the error below… I am trying to get as many features as I can and use a genetic programming based symbolic regression approach to see what I can learn… Any suggestions for a simple example would be greatly appreciated!! When I used pipe.fit(df, “critical_temp”), it looked like it generated the oxidation states but for the final data set it seemed to eliminate them… So I don’t know yet if it worked for my data set… Once again I appreciate any help!!

Sincerely,

tom

AutomatminerError Traceback (most recent call last)
in ()
1 from automatminer.featurization import AutoFeaturizer
----> 2 af = AutoFeaturizer()
3 df1 = af.fit_transform(df, “critical_temp”)

~\Anaconda3\lib\site-packages\automatminer\featurization\core.py in init(self, preset, featurizers, exclude, functionalize, max_na_frac, ignore_cols, ignore_errors, drop_inputs, guess_oxistates, multiindex, n_jobs, logger, composition_col, structure_col, bandstructure_col, dos_col)
93 " ‘fast’) or set featurizers manually.")
94 if not featurizers and not preset:
—> 95 raise AutomatminerError("Please specify set(s) of featurizers to "
96 “use either through the featurizers”
97 “argument or through the preset argument.”)

AutomatminerError

: AutomatminerError : Please specify set(s) of featurizers to use either through the featurizersargument or through the preset argument.


On Friday, February 8, 2019 at 7:05:03 PM UTC-5, [email protected] wrote:

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom

Good to hear!

···

On Sunday, February 10, 2019 at 5:55:33 AM UTC-8, thomas heiman wrote:

Hi Alex,

I figured it out.

Sincerely,

tom

On Saturday, February 9, 2019 at 8:37:11 PM UTC-5, thomas heiman wrote:

Hi Alex,

I am trying to run the following code using the newest Matminer and Automatminer using data the has two fields (1)composition and (2) Tc:

from automatminer.featurization import AutoFeaturizer

af = AutoFeaturizer()
df1 = af.fit_transform(df, “critical_temp”)

I get the error below… I am trying to get as many features as I can and use a genetic programming based symbolic regression approach to see what I can learn… Any suggestions for a simple example would be greatly appreciated!! When I used pipe.fit(df, “critical_temp”), it looked like it generated the oxidation states but for the final data set it seemed to eliminate them… So I don’t know yet if it worked for my data set… Once again I appreciate any help!!

Sincerely,

tom

AutomatminerError Traceback (most recent call last)
in ()
1 from automatminer.featurization import AutoFeaturizer
----> 2 af = AutoFeaturizer()
3 df1 = af.fit_transform(df, “critical_temp”)

~\Anaconda3\lib\site-packages\automatminer\featurization\core.py in init(self, preset, featurizers, exclude, functionalize, max_na_frac, ignore_cols, ignore_errors, drop_inputs, guess_oxistates, multiindex, n_jobs, logger, composition_col, structure_col, bandstructure_col, dos_col)
93 " ‘fast’) or set featurizers manually.")
94 if not featurizers and not preset:
—> 95 raise AutomatminerError("Please specify set(s) of featurizers to "
96 “use either through the featurizers”
97 “argument or through the preset argument.”)

AutomatminerError

: AutomatminerError : Please specify set(s) of featurizers to use either through the featurizersargument or through the preset argument.


On Friday, February 8, 2019 at 7:05:03 PM UTC-5, [email protected] wrote:

Hey thomas,

Are you still having this problem on newer versions of automatminer?

Thanks,

Alex

On Tuesday, January 22, 2019 at 9:36:38 AM UTC-8, thomas heiman wrote:

I have a data set with roughly 25k compounds. About 20k of them have chemical formulas with fractional components like so:

Ba0.2La1.8Cu1O4
Ba0.1La1.9Ag0.1Cu0.9O4
Ba0.1La1.9Cu1O4
Ba0.15La1.85Cu1O4
Ba0.3La1.7Cu1O4
Ba0.5La1.5Cu1O4

When I ran the code below on the whole dataset, I did not get the OxidationStates. However,when I separated the compounds with “normal chemical formulas” i.e. nonfractional components like these :

La2Ba4Cu6O14
Y1Ba2Cu3O

Nd2Ba3Cu5O
Sm1Ba2Cu3O
Gd2Ba3Cu5O
Y1Ba2Cu3O

and ran the code below again.

import pandas as pd
import numpy as np

df = pd.read_csv(‘Documents/Superconductor24K.csv’)

from automatminer.featurization import AutoFeaturizer
af = AutoFeaturizer()
df1 = af.fit_transform(df, ‘critical_temp’)

I got oxidation states for each compound… Any ideas on what I could do so that all of the compounds get oxidation states? This ties into predicting the structure using pymatgen… Thank you!

Sincerely,

tom