Dear all,
I was trying to adapt and apply the LAMMPS example found in the ELASTIC_T directory to a different system (LJ potential, pppm electrostatic - one graphene sheet and nylon chains).
Maybe I encountered a bug in the GPU package (or maybe is an effect of the lower precision of the GPU computations, I don’t know), but with the GPU package turned on all (except one) simulations crashed, every time at a different step of the simulation.
I tried the ELASTIC_T procedure in
- adiabatic conditions
- fix NVE+Langevin
- fix NVT
The three systems were each run on
- plain CPU (MPI + pkg OMP with 1 thread),
- MPI + pkg GPU on a P100
- MPI + pkg GPU on a RTX-2080 super
I’m attaching an archive with the results obtained of every simulation and the data/input files needed to reproduce the issue.
Also, the file can be downloaded here ( https://www.dropbox.com/s/9y075qlhecoxv2z/not_working_gpu_ELASTIC_T.tar.xz?dl=0 ).
All simulations in the directory were done with openMPI 4.0.3, Cuda 10.2.89, LAMMPS 18 Feb 2020. The OS was Centos 7 for the CPU / P100 runs and Ubuntu 18.0.4 for the RTX2080S runs.
I performed many more runs, with different versions of openMPI or mpich, but the results are the same.
All runs on the CPU went fine and the results are reported (the runs are too short to be scientific reasonable, although the results are somewhat reproducible, I was still in the getting-familiar phase). The runs on GPU either failed without a clear error reporting or with a generic Segmentation Fault or Address not mapped (1). Just 1 GPU run completed (the NVT simulation on RTX2080S), maybe by chance, reporting results similar to the CPU code.
Maybe I missed something, maybe my system is unstable, maybe I made something wrong, but I can’t understand why this is happening, I hope someone here will guide me.
Have a nice day,
Domenico
not_working_gpu_ELASTIC_T.tar.xz (884 KB)