Erratic behavior of USER-INTEL package on Intel 5120D co-processor

I am facing problems in running simulations on Intel xeon phi co-processor. Simulations show random jumps to very high energy with constantly increasing temperature before crashing. Is it some bug or I am doing something wrong?

Given below are the output of simulation of test file provided( src/USER-INTEL/TEST/in.intel.rhodo)

lmp_intel_coprocessor -sf intel -pk intel 1 -in in.intel.rhodo > intel.out

Step TotEng E_vdwl E_coul Lx Ly Lz Pxx Pyy Pzz Temp

143 -582081.48 -54926.72 3247060.5 220 154 145.4442 -486.61618 -265.03283 -646.71286 332.67398
144 -582085.66 -54919.015 3246710.5 220 154 145.4441 -459.97364 -251.02358 -619.5491 333.01217
145 348494.99 22136.739 3308904.5 220 154 145.444 5747.701 5763.0662 5812.8833 328.94849
146 -603841.96 -65891.658 3223212.5 220 154 145.44391 -1597.9318 -1423.882 -2467.4327 327.22903
147 -603847.42 -65893.53 3222876.2 220 154 145.44382 -1572.8269 -1409.3345 -2435.4237 327.03104

246 -269220.77 -64449.766 3203370.4 220 154 145.4322 -761.18156 -918.2328 -760.13833 570.53238
247 1094824.7 49365.188 3322970.7 220 154 145.43208 6660.9286 7325.2165 9994.2725 567.99309
248 10039.016 -61661.367 3207140.9 220 154 145.43195 -560.88192 -560.63403 -6.2621816 564.93745

273 8134.7597 -58289.568 3214385.6 220 154 145.42879 -601.09257 -445.73613 -69.66754 581.87608
274 2331376.4 1842946 3331436.6 220 154 145.42867 2.1861302e+59 5.8636096e+58 -3.536433e+57 0
ERROR on proc 5: Out of range atoms - cannot compute PPPM (…/pppm.cpp:1918)
ERROR on proc 4: Out of range atoms - cannot compute PPPM (…/pppm.cpp:1918)
ERROR on proc 3: Out of range atoms - cannot compute PPPM (…/pppm.cpp:1918)
ERROR on proc 2: Out of range atoms - cannot compute PPPM (…/pppm.cpp:1918)

Similar behavior was observed in hybrid run (output file intel_omp_hybrid.out)
lmp_intel_coprocessor -sf hybrid intel omp -pk intel 1 omp 2 -pk omp 2 -in in.intel.rhodo > intel_omp_hybrid.out

Details:
LAMMPS version: 14th May 2016
Intel compilers: 15.0.1.133
CPU: Intel IvyBridge 2.4 GHz (not sure of exact make as of yet)
Co processor: Intel Xeon-Phi Coprocessor 5120D
Default Makefile.intel_coprocessor was used with only modification of appropriate compilers (–mpiicc ++ CC) and additional environmental variable (setenv CRAYPE_LINK_TYPE dynamic) (without link type dynamic it was not compiling )
output files are attached if anyone interested.

ASIDE:
In older version older versions of LAMMPS USER-INTEL package (http://web.archive.org/web/20150331065553/http://lammps.sandia.gov/doc/accelerate_intel.html)
it says “If LAMMPS was also built with the USER-OMP package, you must also use the package omp command to enable that package, unless the “-sf intel” or “-pk omp” command-line switches were used.”. I think I misinterpreted it and used to run LAMMPS for hybrid as:
lmp_executable -sf intel -sf omp -pk intel 1 -pk omp $OMP_NUM_THREADS -in input.in
I was just curious what exactly will above method achieve.
I can see in output files that in beginning and in end of out put there is some data transfer between processor and co-processor but that is it. There is no section that says :

intel.out (40.7 KB)

intel_omp_hybrid.out (63.5 KB)

I am able to reproduce this issue with the 15.0.1 compiler (I don’t think I had ever used this version previously). Are you able to use a different version of the compiler? I don’t see the issue with the other versions I have.

Also, plz note that for performance you will not want to run w/ a single MPI task for each coprocessor.

Best, - Mike

We do have intel compilers version 13.1.3.192. But i did not try is as
in docs it said "The recommended version of the Intel(R) compiler is
14.0.1.106. Versions 15.0.1.133 and later are also supported.". Sadly
for version 14 i might have to convince the system administrator and
it would be difficult. I can give version 13 a shot if its useful.
"Also, plz note that for performance you will not want to run w/ a
single MPI task for each coprocessor."- as in? I ran multiple mpi
ranks on host processor. I ommited that part of command just for
better legibility. Or some more modifications are required on
co-prosessor side?