Tips/Tricks for including Transition States in ICET?

Hi,
I am not sure, if icet is the ideal program to use for my purpose. Let me lay out what I have and want to do:

I have a system with a backbone that is kept constant and just one sublattice A. This sublattice A is either occupied with Li or vacancies (X). The composition varies from 100% of the sites occupied with Li down to 0%. I was able to enumerate structures, and used those to fit a CE based on the heat of mixing w.r.t. the two end phases (100% and 0% Li). The predicted heat of mixing looks reasonable, all good.

My actual aim now is to use icet in combination with custom python code to perform kinetic Monte Carlo simulations. To be able to do this, we need to know the migration barriers for Li to jump from one site to a neighboring vacant site. The barrier will mostly depend on the local Li configurations and local Li content (and hopefully less on anything global outside our cutoffs). Therefore, I performed a couple of nudged elastic band (NEB) simulations with different initial and final Li configurations at various compositions. Then, I tried to fit a new CE, this time including the energies of the transition states (whose “sites” I have simply put at halfway between two next-neighbor site on the A sublattice), and again trained the CE based on the heat of mixing. I used both the structures above without transition states as well as the structures with transition states.

Unfortunately, because for every Li in the structure there are now 3 additional transition states connecting to the neighboring atoms, the number of sites per primitive cell on the A sublattice has increased from 1 to 4. This increases the number of clusters from ~100 (of which after fitting only a third are non-zero) to ~8000 (of which after fitting roughly half of them are non-zero). I suppose the huge increase of clusters is caused by the many more options to form clusters between the original sites and the new transition states as well as between only the transition states. However, any cluster containing more than 1 transition state should not be important, because my training data base from the NEB calculations only contains individual jumps. There will never be a structure in my data base where two transition states are occupied at the same time, and I will also not consider such jump processes later in the kMC simulations.

I have a couple of related questions to my approach:

  • Is the absence of any training data for clusters between transition states responsible for the large amount of vanishing parameters?
  • I know about the ClusterExpansion.prune() to get rid of orbits with parameters of zero, but Is there a way to exclude/delete clusters/orbits with more than 1 transition state beforehand? If I need to always take all clusters/orbits into account, then setting up the ClusterSpace, getting the fit data and performing the fitting takes really long. This is restrictive concerning a hyperparameter scan.
  • Do you think it still makes sense to train based on the heat of mixing? In principle, if the heat of mixing is correct, and by knowing the system size and the energies of the end phases at 100% and 0% Li size, one can always re-calculate the total energy of the system at hand based on the predicted hat of mixing. Doing so for the initial state and the transition state we can then get the barrier to use them for the kMC simulations. However, if the system size is large, are even small errors in the heat of mixing likely to screw up the barriers? At least without extensive testing this seems to be case and my predicted barriers scatter quite from the original ones.
  • Are there any other ideas how to approach barriers using icet?

Any comments would be helpful and a related section could be worth to put into the “Advanced Topics” section on the icet documentation website.

Thanks a lot in advance,
Marcel

Hi Marcel

Some short comments regarding your questions

Is the absence of any training data for clusters between transition states responsible for the large amount of vanishing parameters?

That should be the case if you are using regression with regularization and pruning (as in, e.g., ARDR or RFE). These algorithms will “realize” that certain parameters are uncertain and will effectively set them to zero.

I know about the ClusterExpansion.prune() to get rid of orbits with parameters of zero, but Is there a way to exclude/delete clusters/orbits with more than 1 transition state beforehand? If I need to always take all clusters/orbits into account, then setting up the ClusterSpace, getting the fit data and performing the fitting takes really long. This is restrictive concerning a hyperparameter scan.

We are currently working on a tutorial on how to work with orbits/clusters on a more advanced level that goes beyond this older tutorial. There have also been recently related requests/questions (see here) regarding customizing cluster spaces. While we are working on improving the interface, we don’t have anything ready yet. You are welcome to contribute of course. In the meantime you could use the get_coordinates_of_representative_cluster() method to parse the orbits.

  • Do you think it still makes sense to train based on the heat of mixing? In principle, if the heat of mixing is correct, and by knowing the system size and the energies of the end phases at 100% and 0% Li size, one can always re-calculate the total energy of the system at hand based on the predicted hat of mixing. Doing so for the initial state and the transition state we can then get the barrier to use them for the kMC simulations. However, if the system size is large, are even small errors in the heat of mixing likely to screw up the barriers? At least without extensive testing this seems to be case and my predicted barriers scatter quite from the original ones.

We haven’t build CE models for KMC in our group so our experience here is limited. It would require more insight in the setup of your model for me to give a sensible answer.