Bug?

Hi,

I was trying to run a very simple LAMMPS input file (attached) for a UO2 lattice with an oxygen vacancy.

The potential parameters are correct.

With the exact same input file, I submitted 3 different LAMMPS jobs: 1. regular LAMMPS run, 2. with USER-OMP enabled, 3. with Kokkos OpenMP enabled.

For the first 2 cases (regular, and USER-OMP), the job finished without any error. But, the Kokkos run was terminated with an error message: “Non-numeric pressure - simulation unstable”. Pressure became large negative, so division by a large number producing NaN, probably!

Okay, it implies instability in one or more input parameters (though I am pretty sure input potential params are correct). I changed the npt input parameters (mainly the pressure related) a bit, no change!

But, why is this selective to Kokkos only?

I am just concerned if the input is “meaningless”, it should produce error for the regular as well as the USER-OMP versions too, not only the Kokkos version. Either it will produce error for all of the above runs, or it would be all running okay: It should not be selective.

Input file and output for all of the above runs are attached herewith.

I tried with both LAMMPS 2019 & 2020 versions, Intel 2019u5, MKL.

Could there be a bug?

Thanks,
Prithwish

output-kokkos.txt (25.1 KB)

output-regular.txt (3.15 KB)

output-user-omp.txt (3.39 KB)

in.uo2 (1.49 KB)

This is not a bug. You just got lucky with two of the simulations and unlucky with one of them. If you run 100 independent simulations you will probably see a certain fraction of them fail, independent of which compiler options you use. So why are they failing? Well, you are asking LAMMPS to sample from a distribution of volumes consistent with P=0, T=1500. The desired volume distribution is bimodal, with one peak at V0 and and another near INFINITY. The case V ~ Infinity, while physically reasonable, is numerically unstable. I suggest you do some more testing with a somewhat positive pressure.

actually, there seems to be a bug looming somewhere. possibly some conflict between pair coul/long, kspace and pair hybrid/overlay.
there are inconsistent forces for that even with fix nve after the first step.

please find attached a modified input file that can be run with the command line option “-var pot #” with # being a value between 0 and 9 which reduces the number of active pair styles in an attempt to narrow down the source of the issue.
typically 1 & 2, 3 & 4, 5 & 6, 7 & 8, should give results consistent with each other and different accelerator choices. when running with KOKKOS enabled (using OpenMP but just 1 thread) the run with -var pot 5 unexpectedly gives different results from -var pot 6.

Axel.

in.uo2-mod (5.26 KB)

Hi Axel,
Thank you for pointing to this.

Actually I tried various runs with a positive value of pressure (e.g. 1atm etc. instead of 0) as suggested by Aidan and there is no change. All Kokkos runs (with OpenMP) exit with error while all non-Kokkos runs were completed without any issue (for this particular problem).

In addition, I have a gut feeling that somehow the OpenMP environmental variables used to fix affinity are not “operational” in case of Kokkos OpenMP. I could be wrong but it was an observation from my runs, e.g. a benchmark run for the rhodopsin system produce almost similar walltime for USER-OMP and Kokkos when affinities were not fixed. Next set of runs, I fixed the affinity ( using OMP_PROC_BIND and OMP_PLACES). In this case, walltime required were significantly less for USER-OMP runs (showing its effectivity) while the timings for Kokkos(OpenMP) remain unchanged consistently.

On the other hand, Kokkos(GPU) is quite effective in terms of speed. This produces ‘almost’ similar performance gain like the GPU package (when compared with double precision) even without CUDA-aware Kokkos.

I have not yet got any chance to test Kokkos for KNL (may be in future I will study it).

I also built LAMMPS using GNU compilers (version 8.2.0 + OpenMPI 3.2.1 + Scalapack + (with FFTW3 or KISS), but these runs are significantly slower than Intel builds (with MKL).

Anyway, thanks again for all the inputs till now from the team.
Best regards,
Prithwish

​I’ll take a look.

actually, there seems to be a bug looming somewhere. possibly some conflict between pair coul/long, kspace and pair hybrid/overlay.​

I think I found the cause this bug. Should have it fixed soon.

Stan