Recording and questions for Jakoah Brgoch, "Finding Superhard Materials through Machine Learning"


Jakoah Brgoch, Associate Professor, University of Houston


Monday July 12th, 10am (USA/Pacific)


Superhard materials with a Vickers hardness above 40 GPa are essential in applications ranging from manufacturing to energy production. Finding new superhard materials has traditionally been guided by empirical design rules derived from classically known materials. However, the ability to quantitatively predict hardness remains a significant barrier in materials design. To address this challenge, we constructed an ensemble machine-learning model capable of directly predicting load-dependent hardness. The predictive power of our model was validated on eight unmeasured metal disilicides and a hold-out set of superhard materials. The trained model was then used to screen compounds in Pearson’s Crystal Data (PCD) set and combined with our recently developed machine-learning phase diagram tool to suggest previously unreported superhard compounds. Finally, industrial materials often experience tremendous heat during application; thus, we are building a method for predicting hardness at elevated temperatures.


A recording of this seminar is available here.


If you are unable to ask questions live, please feel welcome to ask any questions following the talk here and we will ask the speaker to check afterwards. Whether they will be able to answer questions or not depends on the speaker’s availability.

Questions answered live

Questions are numbered according to the order they came in. Only questions relevant to this talk specifically are shown.

  1. Is high melting point is one of the criteria for superhard materials?
  2. How transferable are these “handcrafted” features that you’re using for this problem? It seems like you need a new set of features for every problem you’re working on, which can be time-consuming.
  3. Crystalline vs. amorphous materials for hardness. What does your experience tell you regarding additional opportunities for discovery/existence of superhard, amorphous materials? Is crystallinity a prerequisite for superhard materials?
  4. Aren’t the data sets from the Materials Project all calculated at 0 degrees K? Temperature affects hardness. Diamond starts to be noticeably softer at 1000C.
  5. Can you talk a little bit more about the feature reduction?
  6. You briefly discussed how it can be tricky to interpret the importance score assigned to individual properties - e.g. electron density and valence electron density showing up as important for B or G respectively. Is it possible that those properties are both highly correlated, and your feature selection proceedure will choose one or the other based on which one has a (perhaps only slightly) stronger relationship with your target value?
  7. if we are able to synthesize the material after predicting it via ML, why isnt the use of it happening in the industry? preferably over the less harder materials?

Additional questions asked

Our apologies to all who asked questions that we did not have time to address during the talk.

  1. for thin film c-BN what force should we apply to measure the hardness?
  2. Can you predict hardness properties based on microstructure formation as multiple phases are formed upon sintering?
  3. What sort of challenges can come to applying ML for the discovery of materials which can have a huge impact on our society? for example, searching for new materials with high conversion of solar energy to electrical energy?
  4. You specifically sanitized to remove the theoretical compounds from the MP database. Do you think that your model would perform poorly using the theoretical compounds as a test set? It would be interesting to see how it performs.
  5. Reference slide 24 screening… Where does cubic C3N4 or BC2N fit on the graph?
  6. On slide 24, what is the difference between Al2O3 in the top right and bottom right ?
  7. Is there a bias in the selection of materials where MP has run elastic tensor calculations? For example in how chemistries are selected or the size of the unit cells?
  8. Did you consider learning/predicting G^3/B^2 directly?
  9. What do you think about amorphous carbon obtained from fullerenes? Is it superhard? Can it be harder than diamond?
  10. If we want to predict gravimetric capacity of battery, based on known capacity data, which is very hetergenous in nature, what is the range of RMSE is acceptable? Like if the RMSE is around 40 to 45. is that acceptable? (Also the train data set is very small)
  11. The input for the SVM model include features like cohesive energy; how are these features obtained, from a materials database like MP or calcuated from scratch using DFT?
  12. Rule of thumb for “number of samples” vs. “number of predictors” for a good, quantitative supervised ML model?
  13. How much experimental data did you train on? And how much computational data did you train on?
  14. Also weighting the training cost to achieve higher accuracies in the higher hardness regions?
  15. Is there any important correlation between any tolerance factor and the hardness of the material?
  16. great talk. More of a technical question. How do you deal with the different number of input parameters when you have binary - ternary materials?
  17. Structural defects (point, line) can play a big role for material screening. Any thoughts on that ?