Specifying parameters in OpenKIM (relates to Tersoff_LAMMPS_Erhart_Albe_CSi__MO_903987585848)

Dear Ryan and Tobias,

Jacob and I are trying to use the new version of Tersoff_LAMMPS_Erhart_Albe_CSi__MO_903987585848 from https://github.com/t-brink/kim-tersoff . In particular, we are trying to set the parameters. The model has a number of parameters defined in Tersoff_LAMMPS.kim.tpl as
PARAM_FREE_A double energy [numberParticleTypes,numberParticleTypes,numberParticleTypes]

i am almost sure that should instead be
PARAM_FREE_A double energy [numberOfSpecies,numberOfSpecies,numberOfSpecies]

but even if I change that, it does not work. My simulator gets the shape [0,0,0]. I am not entirely sure where the bug is. Who is responsible for allocating these arrays? Is it the model, the simulator or the KIM API? It looks like KIM_API_allocate() will do it, but again I am not sure who is responsible for calling that function. In the current implementation of Asap and Tersoff_LAMMPS_Erhart_Albe_CSi__MO_903987585848, no-one seems to do it.

Finally, the documentation of KIM_API_allocate is a bit cryptic. It does not state when it should be called and by who. And subpoint (1) and (3) seem to me to be in conflict: Is the shape of a MODEL_PARAMETER such as the one above set to [nSpecies, nSpecies, nSpecies] in step (1) and then reset to [0, nSpecies, nSpecies] in step (3)? But maybe that function should not be used at all - should the model instead manually set the shapes of all parameter arrays? I assume that would make sense if the array is anyway allocated by the model.

Assuming that the model needs to inform KIM about the shape of these arrays, I tried to add calls to KIM_API_set_shape in the model, setting the shape to (2, 2, 2). That almost worked, the shape became (0, 2, 2), so somehow the KIM API overrrules the shape. This was apparently due to numberOfSpecies not being set. Asking for it from the simulator side returned 1, but apparently the model thought it was 0. Adding numberOfSpecies to the the KIM_API_setm_data call fixed this (hmm, I should add such a call to my own models, too).

Assuming that this is the correct thing to do, I am attaching a patch to the model. Tobias: The patch contains an ugly hack that you may not want in your code. numberOfSpecies is set in a function from the local variable n_spec. Since giving the KIM API the address of a local variable is a recipe for disaster, I allocate a single integer on the heap and pass that address instead. It is never deallocated! You may want to pass another address to KIM API, or add some deallocation.

Best regards

Jakob

parameterfix.patch (4.86 KB)

Dear Ryan and Tobias,

Jacob and I are trying to use the new version of
Tersoff_LAMMPS_Erhart_Albe_CSi__MO_903987585848 from
https://github.com/t-brink/kim-tersoff . In particular, we are trying
to set the parameters. The model has a number of parameters defined in
Tersoff_LAMMPS.kim.tpl as
     PARAM_FREE_A double energy
[numberParticleTypes,numberParticleTypes,numberParticleTypes]

i am almost sure that should instead be
     PARAM_FREE_A double energy
[numberOfSpecies,numberOfSpecies,numberOfSpecies]

This is due to a change of the KIM API, I believe. In the past it was called "numberParticleTypes" instead of "numberOfSpecies". I will fix this in git, thanks.

Assuming that this is the correct thing to do, I am attaching a patch to
the model. Tobias: The patch contains an ugly hack that you may not
want in your code. numberOfSpecies is set in a function from the local
variable n_spec. Since giving the KIM API the address of a local
variable is a recipe for disaster, I allocate a single integer on the
heap and pass that address instead. It is never deallocated! You may
want to pass another address to KIM API, or add some deallocation.

This seems weird to me because I get the value of the n_spec variable via the KIM_API_get_num_model_species() API call. Looking at the source code (1.6, so a bit out of date but I don't think this has changed), the number of species supported by a model is knowledge that the KIM API has and that it gets from the descriptor file of the model. If we have to pass it back to KIM, that seems like a bug. Also it worked for me to let KIM initialize the parameter arrays, we should not have to do it ourselves in a model driver, I believe.

One idea could be that this is related to the way you initialize your model: I only ever started from a given model (Tersoff_LAMMPS_Erhart_Albe_CSi__MO_903987585848) and then changed the parameters. This worked for me in the past, haven't re-tested with newer KIM versions. If you start from scratch, perhaps the number of species is not initialized correctly? How to you initialize the model?

Cheers,

Tobias

This seems weird to me because I get the value of the n_spec variable via the KIM_API_get_num_model_species() API call. Looking at the source code (1.6, so a bit out of date but I don't think this has changed), the number of species supported by a model is knowledge that the KIM API has and that it gets from the descriptor file of the model. If we have to pass it back to KIM, that seems like a bug.

Yeah, that is right. I think Tobias's idea about the initialization of the model is a good place to look.

Also it worked for me to let KIM initialize the parameter arrays, we should not have to do it ourselves in a model driver, I believe.

Actually, no. I'm pretty sure that the allocate() function does not do anything with the parameters. The model driver needs to allocate memory for these parameters. (Of course, if they are all scalars then you could get away with storing their values in the space meant to store the pointer to an allocated chunck, but this would not be technically correct and would lead to problems.)

Ryan

Hi Jakob,

Thanks for the email. I appreciate any reports of confusing/wrong/inconsisten documentation. I'll try to review and update this and send some comments when I can get a chance. In short, I believe, the allocate() function never allocates parameter memory. This is the job of the model (often really the model driver).

Ryan

This seems weird to me because I get the value of the n_spec variable via the KIM_API_get_num_model_species() API call. Looking at the source code (1.6, so a bit out of date but I don't think this has changed), the number of species supported by a model is knowledge that the KIM API has and that it gets from the descriptor file of the model. If we have to pass it back to KIM, that seems like a bug. Also it worked for me to let KIM initialize the parameter arrays, we should not have to do it ourselves in a model driver, I believe.

Yes, I was surprised too. It looks like getting the number of species from KIM_API_get_num_model_species() or KIM_API_get_num_sim_species() are not the same thing as setting numberOfSpecies. Is this because the KIM API overspecifies things, or is it because KIM_API_get_num_model_species() tells the maximal number of species the model supports, and numberOfSpecies specifies the number actually used? At least, directly querying the numberOfSpecies variable did not give the right result with your model. I have not tried my own models yet, but I suspect they have the same problem, since I initialise in the same way.

Now, not having numberOfSpecies set correctly does not seem to be a serious problem, except for the shape returned by KIM_API_get_shape().

One idea could be that this is related to the way you initialize your model: I only ever started from a given model (Tersoff_LAMMPS_Erhart_Albe_CSi__MO_903987585848) and then changed the parameters. This worked for me in the past, haven't re-tested with newer KIM versions. If you start from scratch, perhaps the number of species is not initialized correctly? How to you initialize the model?

I initialize in the same way. Is there another way? Can you use a model driver directly?

Best regards

Jakob

Hi All,

So, I'll try to consicely state the current behavior:

1) KIM_API_get_num_model_species() returns the number of "spec" lines in the model's kim descriptor file.

2) KIM_API_get_num_sim_species() returns the number of "spec" lines in the simulator's kim descriptor file.

3) nubmerOfSpecies is an argument stored in the KIM API object that specifies the number of distinct species in the configuration represented by the data stored in the KIM API object. In principle, this is different than either of the above two values (although it must be less than or equal to both). The simulator is responsible for allocating and registering memory in which to store this value as well as setting this value. The KIM_API_allocate() routine is a convenience routine provided for simulators to use to help allocate and register all the memory that simulators are responsible for.

I'll leave it here and see what questions you still have.

Ryan

Finally, the documentation of KIM_API_allocate is a bit cryptic. It does not state when it should be called and by who.

Yes, this could be clearer. This routine is really only for simulators. Model's should definitely not use it.

And subpoint (1) and (3) seem to me to be in conflict: Is the shape of a MODEL_PARAMETER such as the one above set to [nSpecies, nSpecies, nSpecies] in step (1) and then reset to [0, nSpecies, nSpecies] in step (3)?

Correct. The zero in the first value then indicates that the array has not been allocated and registered. This value is set appropriately by a call to KIM_API_set_data() using the other extent values and the size parameter of the KIM_API_set_data() routine.

But maybe that function should not be used at all - should the model instead manually set the shapes of all parameter arrays? I assume that would make sense if the array is anyway allocated by the model.

Yeah, I see now why this is confusing...

The model should not assume that KIM_API_allocate() has been called. So, it should assume that the entire shape array needs to be set using KIM_API_set_shape() and that the parameter memory should be allocated and registered using KIM_API_set_data().

Ryan

I'll re-check my code and the docs. I'll let you know about any updates and bug fixes. It's very useful to get some actual testing and feedback on my code, so please keep it coming.

Dear Ryan,

Thank you again for your always helpful comment. Below are some comments to your comments, but also some thoughts / doubts about the API in general.

Finally, the documentation of KIM_API_allocate is a bit cryptic. It does not state when it should be called and by who.

Yes, this could be clearer. This routine is really only for simulators. Model's should definitely not use it.

OK. My ASAP simulator does not call KIM_API_allocate. Should it do that?

In fact, it is not very clear who is responsible for allocating what in the KIM API. Which quantities should be allocated by the model, which should be allocated by the simulator, and which are allocated automatically during matching?

My interpretation is that quantities such as numberOfSpecies are set by the simulator, in a call like this:

  model->setm_data(&x, 3*4,
      "numberOfParticles", 1, &nSize, 1,
      "numberOfSpecies", 1, &nSpecies, 1,
      "cutoff", 1, &cutoff, 1);

Is that correct? Or should the simulator use KIM_API_allocate() ?

I am also in doubt about the correct way to use numberOfSpecies. It is my impression that KIM_API_get_num_model_species() can be called by the simulator to learn the maximal number of species supported by the model. Similarly, KIM_API_get_num_sim_species() will tell the model the maximal number of species supported by the simulator. Or will it tell the actual number of species in the simulation? Finally, numberOfSpecies would be the actual number of species in the simulation, is that correct?

But then what about parameters defined as

PARAM_FREE_a …. [numberOfSpecies, numberOfSpecies] ?

should the model make sure that only parameters for the species actually present in the simulation are available in this array? Sometimes a model would internally contain arrays that are dimensions [max_numberOfSpecies, max_numberOfSpecies]. It it the responsibility of the model to extract the right rows and columns and present to the simulator? (and copy it the other way when reinitialized)

And who is responsible for allocating the memory for these arrays; the model or the simulator? In Tersoff_LAMMPS_Erhart_Albe_CSi__MO_903987585848 the model does this using KIM_API_setm_data; and also sets the shape of the arrays. But if numberOfSpecies in the dimensions really means the *actual* number of species in the simulation, then I guess only the simulator can allocate and reshape these arrays. And what happens if the simulator changes numberOfSpecies and reinitializes the model? Then who is responsible for reallocating the storage, for assuring that the shapes are correct, and most importantly: for filling in values in any new columns/rows in the parameter arrays? The model cannot do this, because the whole point of reinitializing the model may be that the simulator has changed some of these values. But the simulator cannot do it either, maybe it has only changed numberOfSpecies because a new species was introduced in the simulation; the simulator may not know what the default parameters for that species should be.

A general comment on the documentation, now that I have been using the KIM API for a while. The “low level documentation” is excellent: KIM_API_Description.txt describes every API call in great detail. The only thing missing is a field for each call indicating if the call is intended to be used by a model, by a simulator, or by both. But we are missing some kind of “high level documentation”, a "best practices" or a “how to" for writers of models and simulators. What is the model supposed to do during initialization, during calculation, during reinitialization and during shutdown? Similarly for the simulator.

With my best regards

Jakob

Hi Ryan,

As you can probably see from my previous email, I had overlooked this email. So the simulator should allocate most stuff, probably using KIM_API_allocate.

But to summarize my main doubt from the previous email: what about the parameters (PARAM_FREE_XXX and friends). It sounds like KIM_API_allocate does not allocate these, so I assume the model should do it. Should it read numberOfSpecies, read the actual species, allocate the arrays, and fill in the values for the relevant species?

And what if the simulator wants to change some of the values in this array? If the model reallocates it during reinitialization, then the simulator cannot pass values in these arrays. If the simulator has not changed numberOfSpecies then I guess there is no problem, but what if it has changed?

Best regards

Jakob

A general comment on the documentation, now that I have been using the KIM API for a while. The “low level documentation” is excellent: KIM_API_Description.txt describes every API call in great detail. The only thing missing is a field for each call indicating if the call is intended to be used by a model, by a simulator, or by both. But we are missing some kind of “high level documentation”, a "best practices" or a “how to" for writers of models and simulators. What is the model supposed to do during initialization, during calculation, during reinitialization and during shutdown? Similarly for the simulator.

Dear Jakob,

Thanks for your thoughtful comments and suggestions! You active use of the KIM API and regular feedback are extremely helpful and greatly appreciated.

I like your idea of adding a field regarding the intended user (model or simulator) for each of the KIM API routines. I'll try to do that for the next release.

I agree that we are missing the "high level documentation" and your comment/request is timely and provides good motivation. We have been planning to write a paper on the KIM API that would be focused at this level. As usual, other responsibilities have delayed progress in this area. However, I am starting to see the light at the end of the tunnel (or at least a window along the way) and I expect to be able to devote time to this activity a little later this summer.

I hope you can bear with me; And in the mean time, I'm happy to act as the KIM API Oracle, in lieu of such a document, on these matters via this mailing list. :slight_smile:

Hi All,

So, I'll try to consicely state the current behavior:

1) KIM_API_get_num_model_species() returns the number of "spec" lines in the model's kim descriptor file.

2) KIM_API_get_num_sim_species() returns the number of "spec" lines in the simulator's kim descriptor file.

3) nubmerOfSpecies is an argument stored in the KIM API object that specifies the number of distinct species in the configuration represented by the data stored in the KIM API object. In principle, this is different than either of the above two values (although it must be less than or equal to both). The simulator is responsible for allocating and registering memory in which to store this value as well as setting this value. The KIM_API_allocate() routine is a convenience routine provided for simulators to use to help allocate and register all the memory that simulators are responsible for.

I'll leave it here and see what questions you still have.

Ryan

Hi Ryan,

As you can probably see from my previous email, I had overlooked this email. So the simulator should allocate most stuff, probably using KIM_API_allocate.

Hi Jakob, No problem. It is great to have your "outsider" view on all of this. You are identifying some aspects that we have been blind to. As a result of our not fully considering all of these issues, I see that there are a number of ways in wich the current definitions are conventions are confusing and ambiguous. Ultimately, this may lead to revisions of the KIM API to improve, clarify, and simplify its handling of parameters. For now, I'll try to describe the current system and best practice behavior within that system.

Also, if you have not already noticed, there is a lot of good information about who allocates/controls what/when in the "standard.kim" file.

But to summarize my main doubt from the previous email: what about the parameters (PARAM_FREE_XXX and friends). It sounds like KIM_API_allocate does not allocate these, so I assume the model should do it.

Yes.

Should it read numberOfSpecies, read the actual species, allocate the arrays, and fill in the values for the relevant species? And what if the simulator wants to change some of the values in this array? If the model reallocates it during reinitialization, then the simulator cannot pass values in these arrays. If the simulator has not changed numberOfSpecies then I guess there is no problem, but what if it has changed?

This is something that was not clearly thoughtout, and so there is ambiguity in my own mind. However, now that the issue is clear, I'll try to create a definition of expected behavior.

First, I should note that is somewhat odd for a Model's parameters to depend on anything but the number of species supported by the *Model*. Thus, a parameter being defined with shape [numberOfSpecies] is really ill-defined. It would be more reasonable/common to have the explicit number of species supported by the Model listed in the shape, or if a model driver supports a variable number of speices, then a shape of the form [:] would be appropraite.

Second, I believe that a Model's parameters should not depend in any way on the information provided by the Simulator (in its kim descriptor file or in any of the input/output arguments stored in the KIM API object). In fact, in "standard.kim" there is a comment to the effect "...this storage should not depend on the content provided by the Simulator."

Now, the Model needs to allocate and set its parameters when KIM_API_model_init() is called (by the simulator). At the time of this call, only the "cutoff" argument is required to be allocated and registered (by the simulator) within the KIM API object. Thus, the model cannot expect to be able to access the "numberOfSpecies" argument at this time.

Until this discussion thread, I have been working under the (unconsious) tacit assumption that `numberOfSpecies' was identical to the value returned by KIM_API_get_num_sim_species(). (In fact, this api routine was only added at v1.6.0) However, the definition (given in "standard.kim") clearly says that `numberOfSpecies' is supposed to be the number of species in the configuration. This can certainly be different (less than) from the number of species listed in the simulator's kim descriptor file. So, it makes sense to allow this. Again, however, I don't see any situation in which a Model's parameters should depend on this value.

According to the documentation in "standard.kim" for parameters, the Model must allocate its parameters in its model_init() function and deallocate them in the model_destroy() function. It is not explicitly stated, but implied, that the memory should not be reallocated anywhere else. However, I can see that this policy should be reisited and documented. I can imagine a reasonable need for reallocation of the memory associated with a model's parameters. However, I believe this should only be allowed within the model_reinit() function. If that seams reasonable, then there should be no problem. The model would need to account for any changes to its parameter values before it reallocates the memory for these parameters. The main implication here would be that the simulator would have to update its pointers to the parameter memory locations after every call to KIM_API_model_reinit(), instead of being able to get them once and then assume that they are always valid.

I'm open to either policy, let me know what you think.

Ryan

But to summarize my main doubt from the previous email: what about the
parameters (PARAM_FREE_XXX and friends). It sounds like
KIM_API_allocate does not allocate these, so I assume the model should
do it.

Yes.

Looking back at my code, I do of course allocate memory for the PARAM_FREE_* variables using the number of species from KIM_API_get_num_model_species(). As far as I understand from this discussion, this should be the correct way to do it.

Should it read numberOfSpecies, read the actual species, allocate the
arrays, and fill in the values for the relevant species? And what if
the simulator wants to change some of the values in this array? If
the model reallocates it during reinitialization, then the simulator
cannot pass values in these arrays. If the simulator has not changed
numberOfSpecies then I guess there is no problem, but what if it has
changed?

This is something that was not clearly thoughtout, and so there is
ambiguity in my own mind. However, now that the issue is clear, I'll
try to create a definition of expected behavior.

First, I should note that is somewhat odd for a Model's parameters to
depend on anything but the number of species supported by the *Model*.
Thus, a parameter being defined with shape [numberOfSpecies] is really
ill-defined. It would be more reasonable/common to have the explicit
number of species supported by the Model listed in the shape, or if a
model driver supports a variable number of speices, then a shape of the
form [:] would be appropraite.

OK, I was working under the assumption that numberOfSpecies is the same as the value returned by KIM_API_get_num_model_species(), which is obviously wrong.

So, what I am supposed to do now is to use lines like

PARAM_FREE_A double energy [:,:,:]

and use KIM_API_set_shape() to communicate the correct shape to the KIM API. Is this correct?

If so, that may already be the solution to the problems Jakob was having. In that case I will upload a fixed version ASAP.

Second, I believe that a Model's parameters should not depend in any way
on the information provided by the Simulator (in its kim descriptor file
or in any of the input/output arguments stored in the KIM API object).
In fact, in "standard.kim" there is a comment to the effect "...this
storage should not depend on the content provided by the Simulator."

I agree.

According to the documentation in "standard.kim" for parameters, the
Model must allocate its parameters in its model_init() function and
deallocate them in the model_destroy() function. It is not explicitly
stated, but implied, that the memory should not be reallocated anywhere
else. However, I can see that this policy should be reisited and
documented. I can imagine a reasonable need for reallocation of the
memory associated with a model's parameters. However, I believe this
should only be allowed within the model_reinit() function. If that
seams reasonable, then there should be no problem. The model would need
to account for any changes to its parameter values before it reallocates
the memory for these parameters. The main implication here would be
that the simulator would have to update its pointers to the parameter
memory locations after every call to KIM_API_model_reinit(), instead of
being able to get them once and then assume that they are always valid.

Hmm, I see two reasons to change that memory:

a) The number of species supported by the model changes (i.e. the output of get_num_model_species() changes). I don't know if this is a good idea. How would this be communicated? What would be the use case? I suppose many models are not prepared to handle something like this.

b) The model may need more or less memory when parameters change. This may be a tabulated potential, where the "resolution" of the tabulated values may change. I'm not sure if something like this is needed, but it would at least only impact those models which make use of it.

Is there anything else I may be missing? I think it is perfectly fine if it only impacts those models which have a genuine need for that. The impact on existing simulators may be minimal, I guess not many actually use model_reinit() at the moment anyway.

Tobias

OK, I was working under the assumption that numberOfSpecies is the same as the value returned by KIM_API_get_num_model_species(), which is obviously wrong.

So, what I am supposed to do now is to use lines like

PARAM_FREE_A double energy [:,:,:]

and use KIM_API_set_shape() to communicate the correct shape to the KIM API. Is this correct?

Right.

If so, that may already be the solution to the problems Jakob was having. In that case I will upload a fixed version ASAP.

Hmm, I see two reasons to change that memory:

a) The number of species supported by the model changes (i.e. the output of get_num_model_species() changes). I don't know if this is a good idea. How would this be communicated? What would be the use case? I suppose many models are not prepared to handle something like this.

I don't really see how this could happen. Currently it cannot happen. The return value of get_num_model_species() is fixed based on the model's kim descriptor file. We would say it is a different model if the number of species it supports changes...

b) The model may need more or less memory when parameters change. This may be a tabulated potential, where the "resolution" of the tabulated values may change. I'm not sure if something like this is needed, but it would at least only impact those models which make use of it.

Yes, that is a possibility.

Is there anything else I may be missing? I think it is perfectly fine if it only impacts those models which have a genuine need for that. The impact on existing simulators may be minimal, I guess not many actually use model_reinit() at the moment anyway.

I can't think of others, but I'm sure someone will come up with a viable use case.

I think we should go with allowing the Modle to reallocate its parameter memory during the model_reinit() routine, but it should pay attention to any changes to the parameter values before it does so.

Cheers,

Ryan

OK, I uploaded a new version to github which should behave correctly for the published parameters.

Jakob: could you please check if it works for you now?

Cheers,

Tobias

Dear Ryan and Tobias,

Sorry for dropping out of the conversation so abruptly - I went on holiday with my family, with sporadic internet. :slight_smile:

I will look at this during the coming weeks, and test it. It looks like Asap may not be handling all corner cases correctly when it comes to memory allocation, I will have to fix that first.

Best regards

Jakob