Automatminer - Autofeaturizing two different compositions

Hello,

I am wondering if one can featurize two different compositions at once using AutoFeaturizer.
To be more specific, I am interested in interfacial properties between metal pairs, and thus am trying to featurize both metal A and metal B to develop a model that can predict interfacial properties between metal A and metal B.

It seems that the class automatminer.featurization.core.AutoFeaturizer allows me to set the name of the column containing composition to be featurized, but it allows only one input.

Any suggestions would be appreciated! Thank you.

Hey @sokim,

This is an interesting usecase.

Unfortunately, I don’t think AMM has an “out-of-the-box” solution for you. So you can’t put columns of two metals and get out a model ;(

You also probably can’t just take the dataframe and run it through AutoFeaturizer twice, changing the input each time. There is overlap in the features which will be generated, and you’ll be asking for trouble.

I’d recommend running two AutoFeaturizers - one for each column. Then, make the feature names unique and you will have a feature dataframe as you intend.

df = 
       metal A      metal B  target
0  structure 1  structure 4  1.34
1  structure 2  structure 5  1.67
2  structure 3  structure 6  9.01

Separate them

df_A = 
       metal A  target
0  structure 1  1.34
1  structure 2  1.67
2  structure 3  9.01
df_B = 
       metal B  target
0  structure 4  1.34
1  structure 5  1.67
2  structure 6  9.01

Run Autofeaturizer on each, separately.

af_A = AutoFeaturizer(preset="express", structure_col="metal A", drop_inputs=False)
af_B = AutoFeaturizer(preset="express", structure_col="metal B", drop_inputs=False)

df_A = af_A.fit_transform(df_A, "target")
df_A = df_A.rename(columns={f: "A--" + f for f in af_A.features})
df_B = af_B.fit_transform(df_B, "target")
df_B = df_B.rename(columns={f: "B--" + f for f in af_B.features})

Then you’ll have unique features which you can pandas merge together, or do whatever you want with:

df_A = 
       metal A  target  A--some feature
0  structure 1    1.34            68
1  structure 2    1.67           101
2  structure 3    9.01            12


df_B = 
       metal B  target  B--some feature
0  structure 4    1.34            -11
1  structure 5    1.67             41
2  structure 6    9.01            139
df_features = df_A.merge(df_B)

df_features = 
       metal A      metal B  target  A--some feature  B--some feature
0  structure 1  structure 4    1.34               68              -11
1  structure 2  structure 5    1.67              101               41
2  structure 3  structure 6    9.01               12              139

Note that in this example I just used a single feature generated by each autofeaturizer, but in reality it will generate many.