Intel compiler performance

schiotz · November 12, 2014, 8:56am

Hi again,

I was trying to measure performance of my MD code using OpenKIM and the same potential in “native” mode, but got some really strange results. I am using the Intel compiler suite (version 13.0.1) for two reasons. First, Asap is usually compiled with the Intel compiler as they give a significant performance boost to this code, apparently the Intel compiler is better at vectorizing tight loops with exponential functions. Second, our gfortran is too old for OpenKIM.

I saw that using the OpenKIM interface cost approximately a factor two in performance. That does not seem reasonable, and is not consistent with what I see on my Mac. Forcing OpenKIM to compile with the same compiler options that I use for Asap only gave a slight boost.

Profiling the running code with operf immediately showed the problem. The code was spending most of its time in exp.L, whereas Asap spends it time in XXX, which is the vectorized version.

The intel compiler clearly compiles the vectorized code (I enabled -vec_report1, so it prints out when it does so). I know that the compiler generates more than one code path when it does this, so there is a code path for cpus that do not support SSE instructions. So my best guess is that somehow the “wrong” code path is taken in the OpenKIM model. It is not just as simple as loading a shared object cause this problem, since Asap is normally loaded into Python as a shared object, without this problem appearing. Perhaps the “magic” that OpenKIM is doing to hide symbols and protect against name collisions in the loader is interfering with the code path selection. Does any of you have any insight to share?

I am aware that this is a highly exotic problem, and it only affects performance. So please do not spend a lot of time on this.

Best regards

Jakob

PS. I am considering compiling a modern GCC on the cluster. Which version should I choose, 4.8.3 or 4.9.2 ?

relliott · November 15, 2014, 4:48pm

Hi Jakob,

I'm afraid I don't have a good idea. You might try looking at the output of the ldd command (applied to the model shared library of interest). This will list the shared library dependencies for the model. It might give some insight...

PS. I am considering compiling a modern GCC on the cluster. Which version should I choose, 4.8.3 or 4.9.2 ?

At this point I would suggest 4.8.3; At this point, the KIM API code and models have been used with the 4.8 series more than the 4.9 series.

Cheers,

Ryan