Recording and questions for Zachary Ulissi, "Accelerating Catalyst Discovery Using General Datasets and Graph Neural Networks"


Zachary W. Ulissi, Assistant Professor of Chemical Engineering and Materials Science and Engineering, Carnegie Mellon University


Wednesday August 18th, 10am (USA/Pacific)


Machine learning accelerated catalyst discovery efforts have seen much progress in the last few years. Datasets of computational calculations have improved, models to connect surface structure with electronic structure or adsorption energies have gotten more sophisticated, and active learning exploration strategies are becoming routine in discovery efforts. However, there are several large challenges that remain: to date, models have had trouble generalizing to new materials or reaction intermediates and applying these methods requires significant training. I will briefly introduce the Open Catalyst Project and the Open Catalyst 2020 dataset, a collaborative project to span surface composition, structure, and chemistry and enable a new generation of deep machine learning models for catalysis. I will then discuss initial results for state-of-the-art deep graph convolutional models and significant recent progress from others in the community, many of which are likely to improve models in related materials science areas. As an example application I will show how these efforts are already assisting in material development for water, in collaboration with Anubhav Jain (LBL).


A recording of this seminar is available here.


If you are unable to ask questions live, please feel welcome to ask any questions following the talk here and we will ask the speaker to check afterwards. Whether they will be able to answer questions or not depends on the speaker’s availability.

Questions answered live

Questions are numbered according to the order they came in. Only questions relevant to this talk specifically are shown.

Number Question
1 Have you considered using Optimade Single API for lots of materials databases
2 In the reaction networks study, how can we know which path is the indeed the correct one? Equivalently, how do we get the ground truth data for training machine learning models in this case. If the energies are the targets for model training, does that mean that the model has to be extremely accurate to make the correct prediction?
3 will the material described be sent? for me to study again
6 Every type of molecule has some kind of preference of sites on various surface facets. how is that automated? how do you make sure that in your models, let say CO always sits at fcc/hpc/etc. and has not shifted to a neighboring location
7 In slide 15, do you mean ML-based interatomic potentials as models? If not, could you describe little bit more?
8 Q2. can you also get DOS/electronic density and other properties from your ML potentials (instead of just enegies and forces?)
10 Did you check the accuracy of the ML code with some data set far from training set? if so how accurate it was?
11 In your oppinion, will it be possible on long term to predict kineitc parameters or transition state energies directly (ie skipping the PES generation and modelling part)?
12 have you tried to use ML methods to indenfy other descriptors in the CO2 redcution processes?
13 What about incorporating the competitive aspect into matbench? I.e. add OCP tasks as matbench benchmarking tasks in addition to the OCP leaderboard
14 Can Prof. Ulissi comment on whether metal alloy solid solutions will do something different on catalytic process (CO2 reduction, HER, OER) compared with intermetallics with ordered metal sites? If so what will be the difference come from?
20 Regarding the OC20 competition - great idea! Do you think there should be a separate competition / “prize” that is restricted to small models (i.e., ones that can be trained w/o needing a ton of computing horsepower)? Otherwise a lot of ML can largely move in the direction of who can apply the most computing …
22 Is there any specific advantage of using GNN for catalysts modeling ML? also, which method used to implement force based MLs and are they better than energy based MLs for catalysts?
23 Do you see power in incorporating experimental data sets, whether existing in the literature (mining), or produced perhaps in high-throughput efforts to augment or improve learning efforts based on a 100% computational aproach?
31 Frequently surface defect can make the situation much complicated, right? Any thought how we can tackle that?