ML in predicting the starting composition

Dear users,

I am new here new. I start my experience in ML in material science,

I deal with ceramic refractory materials - material resistant to high temperatures, mostly applied in industrial high-temperature devices. Currently, I am working on a project concerning the development of a new generation of non-chrome, eco-friendly, intelligent refractory materials for the copper industry. I am planning to involve Machine Learning technology to accelerate and optimize the design process. During searching the Internet your work was greatly knowledgeable for my idea.

My project succeeded to pass to the next stage. If I manage to get it, I plan to work on ML.

In this project I will be looking for correlation in relation:
starting composition (input) - final composition of the material (output).
The input and output data will be a qualitative and quantitative analysis of XRD patterns of materials before and after synthesis.
I am wondering which algorithm group my fit my conundrum?

If you can give me any tips I would be very grateful.

Greetings,
Ilona

Dear Ilona,

Congrats on your successful project so far!
Could you clarify some points? What part do you want to use ML for? What do you mean by input composition before synthesis? XRD gives you information about the crystal structure (and only indirectly about the composition). Are you only interested in predicting the composition or the (most stable) crystal structure?

Best,
Peter

Dear Peter,

Thank you for your replay.
By searching the relation in starting composition-final composition I would like to predict the optimal starting composition, thus shorten the long experimental stage of material design and its production.
Yes, you’re right the composition is an indirect result of XRD (both qualitative and quantitative) and I am thinking about it. In that case, the precision of the analysis is significant.
How do you see this ?

Greetings,
Ilona

Illona,

I see. How large is your dataset? Is the dataset comprised of one chemical system or multiple? Is one synthesis technique with the same conditions used or multiple?

Best,
Peter

Dear Peter,

I am planning to conduct about 100 syntheses: about 100 compounds will be obtained. Their chemical composition will be similar, in the sense it will be compound from the system Mg1-xFexAl2O4, where x=0-1. Additionally, Fe will be substituted by different metallic ions, also in gradually increasing amounts.

I would like to apply the same synthesis conditions - the same temperature and gas atmosphere.

Greetings,
Ilona

Dear Illona,
That should be enough data for conventional supervised learning algorithms (but is typically not enough for deep learning applications unless combined with some sort of transfer learning). What sounds challenging to me is how to create clear labels for the output data? I would imagine that your synthesis results in a structure with multiple phases present (phase segregation) - or do you get a homogeneous single phase material out of it?
-Peter

Dear Peter,

I think that deep learning, here, is too requiring in terms of data number, I am conscious of that. Therefore, I am planning to use, as first, regression algorithms.

Yes, that’s true that this is challenging as the output is not homogeneous composition. At the very beginning I will do the theoretical analysis of equilibrium phases that creates at equilibrium conditions, using FactSage software for theoretical calculations. However, I am still wondering on how to label them. Generally, I will be looking for composition that consist of specific amount of specific phase (e.g. some low temperature phase the may destroy the materials properties),

On the other hand, I would like to ask about structure predictions using ML - is it that the input is the XRD patterns and the output is e.g. lattice parameter etc. ? What type of algorithms are used for that, and how many of input patters you have to do so ?

Warm greetings,
Ilona

Sounds good, Ilona!
Regarding the XRD, it is also not so easy to evaluate XRD patterns of a sample with multiple phases present (as multiple peaks may overlap with other peaks). However, there has been some work on ML on XRD patterns - not sure if those are applicable to your case, but you could give it a try:
https://www.nature.com/articles/s41524-019-0196-x
https://www.nature.com/articles/s41467-019-13749-3
Good luck with your research!
-Peter