I have column in my data " Material composition" whose entries are
Ag30Cu40Zr30
Ag20Cu30Zr50
Cu50Zr50
Etc Etc ( Here total composition add to 100%)
My question is i want to split this Material Composition column in a meaningful way such that if i input elements with given proportion i must be be able to predict response
Is it that you are looking to figure out how to change a string from “Cu50Zr50” to some object that captures that the materials is half Cu and half Zr? You need to do any splitting on the string itself. Check out the “Compile the Training Set” section of this notebook. It shows you how take a list of strings of composition, parse them to create Pymatgen Composition objects, and use those objects to compute other features.
Is it that you are looking to figure out how to change a string from “Cu50Zr50” to some object that captures that the materials is half Cu and half Zr?
YES.
once i create composition object , is it necessary to featurize it using Domain properties, or else we can use simple machine learning like one hot encoding to convert that object as a feature vector?
You will have to convert that object into features before you can use it for machine learning.
That tutorial demonstrates one method for converting the features using domain knowledge, which should give you a good starting point for the dataset you are working with.
I went through the notebook you suggested . I have one more query In the same notebook formation.ipynb if i want to predict ‘delta_e’ for new test point material composition say Al2Cu1S1
Once you have featurized the data, you can build a machine learning model with your library of choice (iirc, in that notebook it is scikit-learn). See the “compiling an ML model” section of the notebook. You should be able to do something similar for your own use case. Depending on the property you are trying to predict, the results may not be accurate on your first try and might require some hyperparamter adjustment during validation.
If you are asking how to predict on new data once you have compiled your model, you need to run your new samples through essentially the same workflow. You can reuse the same objects you used to featurize/select features/train/etc. Just make sure the generated features of your new samples are the same as the ones used to train your ML model. I.e., if you did mymodel.fit(X, y) to train, you’d use mymodel.predict(X_new) to predict on new data.
P.S.
I might recommend automatminer which wlll do a lot of these tedious steps for you!