Using openKIM to simulate substitutional and interstitial alloys with the QC or DMD framework

Hello everyone,
I am a doctoral student at ETH Zurich working in Quasi-Continuum (QC). Our group intends to use the openKIM API and its potentials for simulating materials with substitutional and (or) interstitial impurities. We, however, use the framework introduced by Venturini et al. where each lattice (or interstitial site) has a statistical species occupation (instead of being specified as one of the species completely), which is updated every step using a master equation for mass diffusion.

As of now, the KIM API does not support these species atomic concentrations as an input, and the API with the Portable Models is well suited for MD calculations, where each atom belongs to one species and there is no species concentration involved.

To be able to deal with this type of a modeling framework, the KIM API needs to be extended to accept atomic concentrations as a user input, and new portable models will need to be introduced, which interpret these atomic concentrations correctly and use them to compute the potential, forces, virial etc.

As a first thought, I believe this can be achieved by introducing another object of the ‘ComputeArgumentName’ class (say ‘particleAtomicFractions’) and then to use the ‘SetArgumentPointer’ function in the ‘ComputeArgumentsImplementation’ class to accept them as a user input. Also. the status of this input could be kept as ‘optional’ instead of ‘requiredByAPI’, so that only the corresponding portable models interpret and treat them accordingly, while it can be ignored by other models used by MD calculations.

I would be grateful to look at suggestions of other people in the community regarding how this could be achieved in a smooth and seamless fashion.

Thanks for your time.
Shashank Saxena.

Hi Shashank, Thanks for your post.

This is an interesting “alloy modeling” approach and I, personally, am interested in performing this type of simulation. The key, from the kim-api perspective, will be to find the right level of abstraction and generality to use for this feature.

Two questions come to mind at the moment:

  1. You use terms like “lattice” or “interstitial” site. Is this modeling approach limited to periodic systems? Would it make sense to perform a simulation of a finite collection of particles as a “boundary value problem”? Or, would this be considered outside the theoretical framework, which should be limited to simulation of “small unit-cell” periodic crystalline systems? If this sort of Model should be limited to periodic systems, that is more problematic for the kim-api which doesn’t fundamentally define the concept of periodic boundary conditions…

  2. Your suggestion for a particleAtomicFractions is the reasonable first thought for supporting this feature. However, I’m not sure what that argument should contain. Currently all the ComputeArgumentName arguments are associated with a homogeneous data array of one of the DataType types (Integer or Double). In this case, we need, effectively, a mapping from “SpeciesCode” (or possibly SpeciesName) to a double (the fractional occupation). Further we would like/want to ensure that the specified occupation was complete (added up to 1), I think.
    Support for this sort of more complex data-structure as an argument could be complicated to add to the kim-api. We’ll want to explore possibilities for encoding the particleAtomicFractions that might be simpler to handle in the cross-language setting required by the kim-api. I’m not sure there is a simple answer to this issue.

It would be great to explore these ideas/issues here and to get the community to engage with the discussion and design decisions. Thanks for getting the ball rolling.


Hi Ryan, Thank you for your reply!

  1. I don’t think that this approach is limited to only periodic systems. In general, we could have any finite collection of particles, which are placed at sites which I am referring to as lattice sites (and the positions of which we solve for by energy minimization).
  • On these lattice sites, two or more substitutional alloying species could be present with their respective concentrations (and these concentrations here must add up to 1). So, every lattice site is a combination of multiple species and not one species in particular. Therefore, we would need only the ‘particleAtomicFractions’ for each site and not the ‘speciesCode’. This is the substitutional alloying framework which differs a bit from the interstitial one.

  • For interstitial alloying, we also have to consider sites other than our usual lattice sites. These are the sites which smaller sized particles (e.g. Li in Si) may occupy, and these sites I am referring to as the interstitial sites. The diffusing species is now allowed to occupy only these interstitial sites (and not the lattice sites which are always occupied only by the main species). Hence, there is a concentration of the diffusing species at the interstitial sites, which could be anything between 0 to 1. In crystalline periodic lattice systems, we know the positions of these interstitial sites for a uniform lattice, which is passed as an initial guess to the calculations of energy minimization. However, now we solve for the positions of both the lattice and interstitial sites. Hence, here we would need both the ‘particleAtomicFractions’ and the ‘speciesCode’ for each site to find forces (as we now differentiate by allowing interstitial sites to be filled only by the diffusing species and the lattice sites to be filled only by the main species).

  1. According to me, the ‘particleAtomicFractions’ could be another homogeneous data array of type Double, and its interpretation would depend on the portable model one uses. For example, while simulating a substitutional alloy with 2 species, ‘particleAtomicFractions’ will be passed as a flattened Nx2 array (N being the total number of lattice sites). On the other hand, while simulating an interstitial alloy with two species, ‘particleAtomicFractions’ will be passed as a Nx1 array (N being the total number of lattice and interstitial sites). I agree with you on the fact that this might not be the simplest thing to do in the cross language framework of KIM.

Looking forward to more discussion.
Shashank Saxena.

Hi Shashank,


  1. So, I don’t see a need to distinguish between lattice sites and interstitial sites in the way you do. If you want to do interstitial alloying as you describe. I think you can just set the atomic fractions for “sites” to all zeros except for the pure element of the site. Unless I’m missing something?

  2. I see a couple of challenges with the scheme you propose.
    a) The kim-api does not establish an ordering of the species that a PM supports. Instead, it allows the PM to specify a unique speciesCode integer to designate each supported species. These integers are not required to be sequential.
    This means we would have to establish a definition for the mapping between supported species and the corresponding column of the particleAtomicFractions array.
    b) There is no convenient way to determine the number of species a PM supports. It can be determined by looping over all species defined by the kim-api and using the GetSpeciesSupportAndCode routine.
    Thus, it would be cumbersome for a Simulator to determine the necessary size of the particleAtomicFractions array.
    c) Some PMs support a large number of species.
    This would mean that your proposed particleAtomicFractions array would become very large and require a significant amount of memory. I think we’ll likely need to consider some sort of “sparse” specification of the atomic fraction information.

I don’t really have anything better to suggest at the moment.

One thought that might be worth consideration is:

The kim-api could be extended to support a mode where species names could be arbitrary strings (perhaps with a limited character set) defined by the PM. (This would be a useful feature for supporting the “atom-types” of “bonded force-fields”.) Then, an additional convention could be established for alloy potentials of the type being considered here. For example, a Simulator could pass a string like “Ni51.3Ti48.7” to GetSpeciesSupportAndCode and the PM could parse this string to establish the atomic fractions of interest. Then, the PM could assign a unique speciesCode to this set of atomic fractions for the Simulator to use as usual.

Probably we would want to create some helper routines as part of the kim-api to facilitate PMs parsing such strings and managing storage and lookup of the species codes to atomic fractions specification.


Hi Ryan,
Thank you for your reply!

  1. I agree that we can do so to make it more general and not distinguish between lattice and interstitial sites.

  2. It is correct that a mapping needs to be then established as said, and the concentration matrix would be huge for many species. The issue of the simulator counting the number of species supported by a PM could be solved by adding another routine (say ‘GetNumberOfSpecies’) supported by the API, and which will be inside the PM’s then.

Your suggestion of extending the Species Names to be arbitrary strings seems to be helpful. However, one would then be able to simulate only uniform alloy concentrations for now (because Ni and Ti will have concentrations of 0.513 and 0.487 on every site). Am I interpreting this correctly?

Shashank Saxena.

Hi Shashank,

  1. good.

  2. Yes, adding such a convenience routine is possible. If needed, we can do that. However, if it can be avoided, that would be best. (Adding it would be a violation of the “Minimal Completeness Principle”. It is already possible (although, inconvenient) to compute the number of supported species. Sometimes, it makes sense to violate the principle, but there should really be a good reason to do so…)

  3. No, my suggestion for a string-based atomic-fractions specification would allow for an arbitrary number of atomic-fractions to be used. I’m thinking like this: Each time that a Simulator calls the GetSpeciesSupportAndCode() routine with a distinct atomic-fraction string, if the Model can support the given combination, then it will add that atomic-fraction specification to a list and assign it a unique integer code (for the current execution) that the Simulator can then use in the particleSpeciesCodes argument.

So, suppose a Simulator wants to do a simulation with “MyAlloyModel” and use two distinct types of sites: (a) 45% Ni and 55% Ti and (b) 5% Ni and 95% Ti. Then, it would do something along the lines of:

(I’m doing this quickly; the syntax is not really correct)

M.GetSpeciesSupportAndCode("Ni45.0_Ti55.0", support_1, code_1);
M.GetSpeciesSupportAndCode("Ni5.0_Ti95.0", support_2, code_2);
double pSpeciesCodes[N];
for (i=1;i<N/2;++i){
   pSpeciesCodes[i] = code_1;
   pSpeciesCodes[N/2+1] = code_2;
A.SetArgumentPointer(KIM::ComputeArgumentName::particleSpeciesCodes, pSpeciesCodes);

This would assign the first half of the sites to fractions to type (a) and the second have to type (b). Hopefully, this is somewhat clearer.

Hi All,

Here is a suggestion for how to add support for alloy potentials that would also generalize to other cases such as coarse-grained models, united atom models used for polymers, and so on.

Add a method to the KIM API to allow simulators and/or models to extend the list of supported particle species names at run time, and to associate data (in the form of key-value pairs) with the new species. Different applications could define conventions on the associated data. For example for alloy potentials, a simulator could defines a new species “NiAl” with associated data,

"type" : "alloy_potential"
"species" : {"Ni", 0.5}
"species" : {"Al", 0.5}

Here the “type” key specifies the type of potential that this species is associated with, and
the “species” key is recognized by alloy potentials and required to contain the chemical element and fraction.

This supports any possible alloying, and as noted above, extends to other types of nonstandard potential forms.


Thank you for your replies!

Dear Prof. Elliott,
Thanks for making your thought process clearer. I understand it better now. So for the most general case, when every site can have a different composition (can be of a different type), every site will have a different code assigned to it then. And every such code will correspond to an info about the atomic fractions (which will have to be stored in some sort of a data structure or an array).

This array could also then take huge amounts of space when number of species is large and every site has a different code.

Dear Prof. Tadmor,
I understand your suggestion as well. But again, it could be so that we are simulating N (a large number of) sites, all with the name ‘NiAl’ but having different data associated with them (due to different atomic fractions for each), which could result in large space occupation to store this data, as previously pointed out by Prof. Elliott.

I am not very skilled as a programmer, but as far as I think, space occupation by the atomic fractions array is something we can’t probably avoid because we are associating new information (pertaining to every species in the model) with every site.

Please let me know what do you think.

Shashank Saxena.

Hi Shashank,

The idea is that you would create different particles for any fractions as needed. My example of “NiAl” was for a half/half ratio. If there was another particle which was 0.2 Ni and 0.8 Al, it could be called something like “Ni_0.2Al_0.8”. This would work as along as it wasn’t necessary to support an excessively large number different possible ratios. If that’s the case, then this might not be a good approach.



I think the discussion is making some progress. Great.

A couple of questions that come to mind.

  • Is it likely that a simulation would “evolve” the site fractions in some way? Or would the list of site fractions for a simulation be fixed at the beginning? If an evolution is possible, what sort of information would be used to determine updates?

  • Do you already have a (non-kim) code that implements models of this type and can perform simulations? If so, could you share (privately, if not publicly) it and some example simulations? Explicit use-cases are often very helpful.