I was trying to measure performance of my MD code using OpenKIM and the same potential in “native” mode, but got some really strange results. I am using the Intel compiler suite (version 13.0.1) for two reasons. First, Asap is usually compiled with the Intel compiler as they give a significant performance boost to this code, apparently the Intel compiler is better at vectorizing tight loops with exponential functions. Second, our gfortran is too old for OpenKIM.
I saw that using the OpenKIM interface cost approximately a factor two in performance. That does not seem reasonable, and is not consistent with what I see on my Mac. Forcing OpenKIM to compile with the same compiler options that I use for Asap only gave a slight boost.
Profiling the running code with operf immediately showed the problem. The code was spending most of its time in exp.L, whereas Asap spends it time in XXX, which is the vectorized version.
The intel compiler clearly compiles the vectorized code (I enabled -vec_report1, so it prints out when it does so). I know that the compiler generates more than one code path when it does this, so there is a code path for cpus that do not support SSE instructions. So my best guess is that somehow the “wrong” code path is taken in the OpenKIM model. It is not just as simple as loading a shared object cause this problem, since Asap is normally loaded into Python as a shared object, without this problem appearing. Perhaps the “magic” that OpenKIM is doing to hide symbols and protect against name collisions in the loader is interfering with the code path selection. Does any of you have any insight to share?
I am aware that this is a highly exotic problem, and it only affects performance. So please do not spend a lot of time on this.
PS. I am considering compiling a modern GCC on the cluster. Which version should I choose, 4.8.3 or 4.9.2 ?