Performance issues with the Tersoff model driver

brink · April 12, 2017, 2:21pm

Hi!

The Tersoff_LAMMPS__MD_077075034781 model driver is a direct port from the implementation in LAMMPS. Comparing the performance of the native LAMMPS implementation with the KIM version via the KIM module in LAMMPS, I see that the native version can be 2 to 3 times faster in serial calculations. In single-node MPI calculations, the KIM version can catch up a bit.

It is not entirely clear to me how this happens. Some preliminary profiling shows that the most time is spent inside the actual compute function, which should be more or less the same as in LAMMPS.

This needs more testing, but before I dive in deeper: are such performance issues expected when using KIM models in LAMMPS? Has anybody benchmarked these things?

Thanks for any input,

Tobias

relliott · April 12, 2017, 2:42pm

Hi Tobias,

This is not our typical experience. We've found that generally the KIM API introduces very little overhead (at least in situations similar to the one you indicate). For example, we've seen that the KIM EAM model driver is actually a bit faster than the native implementaion within LAMMPS....

We'd have to dig into the details to see why you are finding such a significant slow-down in this case....

Ryan

brink · April 12, 2017, 3:22pm

Hi Tobias,

This is not our typical experience. We've found that generally the KIM
API introduces very little overhead (at least in situations similar to
the one you indicate). For example, we've seen that the KIM EAM model
driver is actually a bit faster than the native implementaion within
LAMMPS....

OK, thanks. I will first try to verify the correctness of the model before looking further into the performance. Maybe even upload the model driver as-is to fix the outstanding bugs before trying to improve performance.

I guess I'd have to do more detailed profiling of both this code and the native code to get to the bottom of this. It may be simply some bug introduced during porting.

We'd have to dig into the details to see why you are finding such a
significant slow-down in this case....

I'll get back to the list as soon as I find out more.

brink · April 20, 2017, 3:34pm

Hi Ryan,

it seems the origin of the slowdown is rather simple. For some of the two-body terms, you can do a half iteration even in the Tersoff potential. I did not implement that when I started out with porting and forgot about it.

I played a bit with a test implementation of that scheme and it seems promising. I just need to optimize and test a bit, also with MPI and so on.

I also need to convince myself that I can just test for (i < j) or (j < i) in all cases with ghost atoms, or if I make some error there.

If that is figured out, I could upload a new version featuring correct forces on ghost atoms and published parameters.

Cheers,

Tobias

relliott · April 23, 2017, 2:50pm

Hi Tobias,

Yeah, that can certainly add overhead!

Once you are digging into it, feel free to send me questions (on or offline) about the implementation details for handling all the various NBC cases.... The only particular thing that jumps to mind for the moment is to make sure that you also pay attention to the numberContributing parameter when using a half neighbor list. (but maybe the model driver doesn't support that mode at all.)

Also, I'll note that these sorts of complications and difficulties have led us to use simplicity as a guiding principle for the upcoming kim-api-v2.0.0. There we are restricting to just a single mode (essentially equivalent to the v.1.Y.Z NBC of NEIGH_PURE_F). So, as time constraints become a factor, we would suggest focusing your efforts on getting a NEIGH_PURE_F implementation working and leaving other NBCs to later, as time permits.

Cheers,

Ryan

brink · April 24, 2017, 11:11pm

Hi Ryan!

Keeping this on-list for the moment as it may be interesting for others,
too. I am already focusing on NEIGH_PURE_F with Neigh_LocaAccess because
I followed the KIM v2 discussion some time ago.

Neigh_IterAccess is going away, yes?

I ran into some trouble, namely that I need to know if atom i's neighbor
atom j is a ghost atom. In KIM's half neighbor list modes, there is
numberContributingParticles. Now, the Tersoff potential is impossible to
implement completely using half lists and therefore I do not support
this mode. But even if I want to implement part of the algorithm as a
half list, I need something like numberContributingParticles.

If locator mode is used for get_neigh, I can use that to see if j has
zero neighbors. This is the current implementation, but it is not so
nice and I'm not sure about the performance. I did not achieve more
performance with the correct half list for the pair part so far. Also,
LAMMPS currently prefers Neigh_IterAccess if it is available. What do
you think?

numberContributingParticles is not currently availabe with NEIGH_PURE_F,
correct?

Will it (or sth similar) be available in the v2 API?

Thanks for any answers,

Tobias

"Ryan S. Elliott" <[email protected]> writes:

relliott · April 25, 2017, 6:13pm

Hi Ryan!

Keeping this on-list for the moment as it may be interesting for others,
too. I am already focusing on NEIGH_PURE_F with Neigh_LocaAccess because
I followed the KIM v2 discussion some time ago.

Great!

Neigh_IterAccess is going away, yes?

Correct!

I ran into some trouble, namely that I need to know if atom i's neighbor
atom j is a ghost atom. In KIM's half neighbor list modes, there is
numberContributingParticles. Now, the Tersoff potential is impossible to
implement completely using half lists and therefore I do not support
this mode. But even if I want to implement part of the algorithm as a
half list, I need something like numberContributingParticles.

If locator mode is used for get_neigh, I can use that to see if j has
zero neighbors.

Right. This is really the only way within kim-api-v1.Y.Z.

This is the current implementation, but it is not so
nice and I'm not sure about the performance. I did not achieve more
performance with the correct half list for the pair part so far.

Maybe it is worth creating a list of zero-neighbor particles at the beginning of the compute routine. Not sure if this will help, but it seems like the best option.

Also, LAMMPS currently prefers Neigh_IterAccess if it is available. What do you think?

Yes, I believe that is true. My understanding is that this is preferred because it allows the use of "Hybrid" potential behavior. We have decided that it is preferable to forgo direct compatibility with that feature in favor of the simplicity of having a single mode of neighbor list access. (If it became important, it would probably be possible to update the lammps/kim interface so as to work around this and still allow the use of KIM models with the Hybrid features. Howwever, we have no plans to implement such a feature at this time.)

numberContributingParticles is not currently availabe with NEIGH_PURE_F,
correct?

Correct. You need to get the list for a particle and see if it is of length zero or not.

Will it (or sth similar) be available in the v2 API?

v2 will introduce a new integer array containing 1/0 for each particle to indicate if the particle is contributing or not. The neighbor lists will nolonger provide this sort of information. Instead the neighbor lists will be required to be consistent with what the model would generate all on its own. (That is, if a particle is within the neighbor list cutoff range, it must be listed in the neighbor list. This is true for contributing and non-contributing particles.)

Cheers,

Ryan