How to split material composition column to make sense for machine learning

Milan_Joshi · January 7, 2020, 5:19am

I have column in my data " Material composition" whose entries are

Ag30Cu40Zr30
Ag20Cu30Zr50
Cu50Zr50

Etc Etc ( Here total composition add to 100%)

My question is i want to split this Material Composition column in a meaningful way such that if i input elements with given proportion i must be be able to predict response

Logan_Ward · January 9, 2020, 7:24pm

Sorry for the slow reply.

Is it that you are looking to figure out how to change a string from “Cu50Zr50” to some object that captures that the materials is half Cu and half Zr? You need to do any splitting on the string itself. Check out the “Compile the Training Set” section of this notebook. It shows you how take a list of strings of composition, parse them to create Pymatgen Composition objects, and use those objects to compute other features.

Milan_Joshi · January 10, 2020, 4:40am

Is it that you are looking to figure out how to change a string from “Cu50Zr50” to some object that captures that the materials is half Cu and half Zr?
YES.

once i create composition object , is it necessary to featurize it using Domain properties, or else we can use simple machine learning like one hot encoding to convert that object as a feature vector?

I donot know any material science

Logan_Ward · January 10, 2020, 4:03pm

You will have to convert that object into features before you can use it for machine learning.

That tutorial demonstrates one method for converting the features using domain knowledge, which should give you a good starting point for the dataset you are working with.

Milan_Joshi · January 14, 2020, 5:41am

okay!!! thanku so much

Milan_Joshi · January 16, 2020, 2:44pm

I went through the notebook you suggested . I have one more query In the same notebook formation.ipynb if i want to predict ‘delta_e’ for new test point material composition say Al2Cu1S1

how would i go for it?

ardunn · January 21, 2020, 10:39pm

Hey there,

Once you have featurized the data, you can build a machine learning model with your library of choice (iirc, in that notebook it is scikit-learn). See the “compiling an ML model” section of the notebook. You should be able to do something similar for your own use case. Depending on the property you are trying to predict, the results may not be accurate on your first try and might require some hyperparamter adjustment during validation.

If you are asking how to predict on new data once you have compiled your model, you need to run your new samples through essentially the same workflow. You can reuse the same objects you used to featurize/select features/train/etc. Just make sure the generated features of your new samples are the same as the ones used to train your ML model. I.e., if you did mymodel.fit(X, y) to train, you’d use mymodel.predict(X_new) to predict on new data.

P.S.
I might recommend automatminer which wlll do a lot of these tedious steps for you!

Thanks,
Alex