KIM API to be SemVer compliant (starting with v1.6.0)

Dear KIM Community,

I am happy to announce that, starting with v1.6.0, the KIM API package will conform to the Semantic Versioning Specification (semver.org).

This means that very specific meanings are attached to the MAJOR, MINOR, and PATCH version numbers.

In short, increments in the PATCH number indicate bug fixes that are backwards-compatible and do not change the API. Increments to the MINOR version number indicate that new functionality has been added in a backwards-compatible manner. Increments to the MAJOR version number indicate that incompatible API changes have been introduced.

It is the hope of the KIM API development team that conformance to the SemVer specification will help the KIM Community better understand and adapt to the ongoing development of the KIM API package and the openkim.org framework.

As always, if you have any questions or concerns, we are happy to provide responses to your inquires sent to this mailing list.

Sincerely,

Ryan Elliott

So, now that I have promised SemVer conformance, almost immediately, I run up against an ambiguity: API (application programming interface) vs. ABI (application binary interface) compatibility.

The difference is between compatibility of source code (API) and compatibility of compiled shared libraries (ABI). Clearly, any incompatible API change necessitates an incompatible ABI change. However, it is possible to make an incompatible ABI change that corresponds to a COMPATIBLE API change.

SemVer does not say anything about ABI. It only mentions API. Thus, one can apply SemVer to the API or the ABI for deciding when to increment the MAJOR version number.

* Option 1: increment the MAJOR version number upon incompatible ABI changes.

* Option 2: increment the MAJOR version number upon incompatible API changes.

Option 1 ensures that a user can compile and install Minor and Patch releases of the same Major version and simply expect their preexisting compiled KIM Models and Simulators to "just work"

Option 2 only ensures that if the user compiles and installs the new KIM API version and then recompiles their existing KIM Models and Simulators, that everything will "just work" (without changing any source code).

Option 1 would be most convenient, but would lead to more rapid increments of the MAJOR version number. Also, it can be quite difficult to correctly identify when the ABI changes incompatibly, and thus the chance of releasing versions of the KIM API with "incorrect" SemVer versions increases.

Option 2 is easier to correctly assign SemVer versions for and will result in fewer MAJOR version increments. However, it may require users to recompile their Models and Simulators more often.

Based on the above, I suggest that the KIM API package follow "Option 2". That is, the versioning will strictly follow the SemVer Specification as applied to the API. ABI incompatibilities will not be strictly accounted for by the versioning scheme.

I would be very interested in the KIM communities thoughts on this issue. I'm willing to go with "Option 1" if there are enough people with good arguments who want it.

So, please reply with your comments, thoughts, and suggestions.

Thanks,

Ryan

Hi Ryan,

I think you have made a good case for adopting your "option 2". But like, you I'd be interested to hear from others in the community.

Cheers,

Ellad

Hi Ryan - can you give an example of what kind of change would break the ABI but not the API?

Noam

Sure,

Here are two, somewhat different scenarios:

1) Adding a new member variable to a structure. If this new member variable is for internal use only by the KIM API code, then it can be considered as NOT part of the API. That is, the KIM API user need not know anything about it in order to write their code in conformance with the KIM API. However, such a change will alter the sizeof() result for the structure and render (in principle) compiled version of anything using the old definition incompatible with anything compiled with the new definition.

In most cases these sorts of problems can be eliminated by carefully keeping all "private" KIM API details out of the public headers. If done right, any changes to the KIM API public header files will indicate a change in the API. I intend to set things up in this way. Thus, this sort of scenario should not be a real problem, in practice.

2) In the KIM API, the build system does a lot of stuff. In particular it creates source code for Parameterized Models and to embedded ".kim" files and parameter files into the shared libraries. Some of the details for how this is done could technically be considered part of the API. However, since these details are not formally documented, and the general user of the KIM API does not need to know about them, I view these details as NOT part of the API, and therefore, they (I guess) become part of the ABI. (I also consider these to be details that I am free to change at any time. As long as they do not require any changes to existing model and simulator code. That is, they are changes that may require recompilation, but not source code changes.)

So, as a particular example, I've been working on changes to the build system in order to start using the "xxd" utility for embedding ".kim" files into the shared libraries. This has caused a change in some of the library symbols that the KIM API uses internally to find the ".kim" strings.

Thus, a Model compiled with v1.6.4 and linked with v1.7.0 (to be released sometime in the future) will fail to work properly because v1.6.4 uses one set of symbols to name the ".kim" string, but v1.7.0 expects a different set of symbols.

(So, in a sense the question is: Should this sort of change cause a version increment from v1.6.4-->v2.0.0 or from v1.6.4-->v1.7.0?)

I hope that is all clear; If not, ask for clarification!

Cheers,

Ryan

Thanks for the example. My first response is that the question of what digit to indicate ABI vs. API changes is secondary to what information the user/developer will need. In particular, there can be as many as 3 types of changes:

  1. change nothing in the code (not ABI or API changes, either purely internal or just parameters).
  2. need to recompile the code (ABI/API changes)
  3. (possibly) need to change source code to conform with new API (API changes).

It’s clear that labeling type 3 is easy, and important. Option 2 provides that, by incrementing the major version number. Option 1 does not give an obvious way to distinguish API from ABI changes, and I therefore like it less.

If you could distinguish 1 from 2, it would be nice, because it would save the hassle of needless recompiles, but it sounds like you’re worried about changing the ABI and not noticing, and therefore forgetting to change the version in the way that labels it as an ABI change.

My conclusion is that I prefer option 2. That way I can tell changes that require checking the source for compatibility (via a change in major version). I’d then suggest that any change that breaks (or might break?) the ABI require a minor version change, and restrict patch level changes to those that definitely don’t break the ABI.

Hi Noam,

I think your suggestion is quite reasonable. SemVer allows for the Minor version to be incremented in such situations: "It MAY be incremented if substantial new functionality or improvements are introduced within the private code."

I would be fine with incrementing the Minor version in this way.

Thanks,

Ryan