ionut@ServerS:~$ mpirun -np 1 ./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in in.not_working_Si_cluster -k on g 8 -sf kk LAMMPS (2 Aug 2018) KOKKOS mode is enabled (../kokkos.cpp:45) using 8 GPU(s) Created orthogonal box = (-100 -100 -100) to (100 100 100) 1 by 1 by 1 MPI processor grid Lattice spacing in x,y,z = 5.4305 5.4305 5.4305 Created 45215 atoms Time spent = 0.0307325 secs Neighbor list info ... update every 10 steps, delay 0 steps, check no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 12 ghost atom cutoff = 12 binsize = 6, bins = 34 34 34 2 neighbor lists, perpetual/occasional/extra = 2 0 0 (1) pair reax/c/kk, perpetual attributes: full, newton off, ghost, kokkos_device pair build: full/bin/ghost/kk/device stencil: full/ghost/bin/3d bin: kk/device (2) fix qeq/reax/kk, perpetual, copy from (1) attributes: full, newton off, ghost, kokkos_device pair build: copy/kk/device stencil: none bin: none Setting up Verlet run ... Unit style : real Current step : 0 Time step : 0.25 WARNING: Fixes cannot yet send data in Kokkos communication, switching to classic communication (../comm_kokkos.cpp:463) Per MPI rank memory allocation (min/avg/max) = 243.9 | 243.9 | 243.9 Mbytes Step Temp E_pair E_mol TotEng Press 0 100 -4583859.4 0 -4570381.9 -7340.0099 1000 108.82171 -4590003.9 0 -4575337.5 4512.5174 Loop time of 44.0362 on 1 procs for 1000 steps with 45215 atoms Performance: 0.491 ns/day, 48.929 hours/ns, 22.709 timesteps/s 73.2% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 16.252 | 16.252 | 16.252 | 0.0 | 36.91 Neigh | 0.83549 | 0.83549 | 0.83549 | 0.0 | 1.90 Comm | 0.39132 | 0.39132 | 0.39132 | 0.0 | 0.89 Output | 2.8946 | 2.8946 | 2.8946 | 0.0 | 6.57 Modify | 23.583 | 23.583 | 23.583 | 0.0 | 53.55 Other | | 0.07994 | | | 0.18 Nlocal: 45215 ave 45215 max 45215 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.63111e+07 ave 1.63111e+07 max 1.63111e+07 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 16311116 Ave neighs/atom = 360.746 Neighbor list builds = 100 Dangerous builds not checked Please see the log.cite file for references relevant to this simulation Total wall time: 0:00:50 ionut@ServerS:~$ mpirun -np 8 ./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in in.not_working_Si_cluster -k on g 8 -sf kk LAMMPS (2 Aug 2018) KOKKOS mode is enabled (../kokkos.cpp:45) using 8 GPU(s) Created orthogonal box = (-100 -100 -100) to (100 100 100) 2 by 2 by 2 MPI processor grid Lattice spacing in x,y,z = 5.4305 5.4305 5.4305 Created 45215 atoms Time spent = 0.00804737 secs Neighbor list info ... update every 10 steps, delay 0 steps, check no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 12 ghost atom cutoff = 12 binsize = 6, bins = 34 34 34 2 neighbor lists, perpetual/occasional/extra = 2 0 0 (1) pair reax/c/kk, perpetual attributes: full, newton off, ghost, kokkos_device pair build: full/bin/ghost/kk/device stencil: full/ghost/bin/3d bin: kk/device (2) fix qeq/reax/kk, perpetual, copy from (1) attributes: full, newton off, ghost, kokkos_device pair build: copy/kk/device stencil: none bin: none Setting up Verlet run ... Unit style : real Current step : 0 Time step : 0.25 WARNING: Fixes cannot yet send data in Kokkos communication, switching to classic communication (../comm_kokkos.cpp:463) Per MPI rank memory allocation (min/avg/max) = 52.08 | 53.18 | 54.34 Mbytes Step Temp E_pair E_mol TotEng Press 0 100 -4583859.4 0 -4570381.9 -7340.0099 1000 107.57772 -4590789.8 0 -4576291 4420.0977 Loop time of 19.2805 on 8 procs for 1000 steps with 45215 atoms Performance: 1.120 ns/day, 21.423 hours/ns, 51.866 timesteps/s 75.2% CPU use with 8 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 6.3076 | 6.4848 | 6.619 | 4.0 | 33.63 Neigh | 0.39737 | 0.43875 | 0.47341 | 3.9 | 2.28 Comm | 1.2712 | 1.4205 | 1.5865 | 10.4 | 7.37 Output | 0.42371 | 0.48728 | 0.54513 | 5.9 | 2.53 Modify | 10.316 | 10.41 | 10.52 | 2.4 | 53.99 Other | | 0.03875 | | | 0.20 Nlocal: 5651.88 ave 5714 max 5610 min Histogram: 1 1 2 0 1 2 0 0 0 1 Nghost: 6965.88 ave 7017 max 6894 min Histogram: 1 0 0 1 2 0 0 2 1 1 Neighs: 0 ave 0 max 0 min Histogram: 8 0 0 0 0 0 0 0 0 0 FullNghs: 2.03714e+06 ave 2.06163e+06 max 2.0199e+06 min Histogram: 1 1 2 0 1 1 1 0 0 1 Total # of neighbors = 16297132 Ave neighs/atom = 360.436 Neighbor list builds = 100 Dangerous builds not checked Please see the log.cite file for references relevant to this simulation Total wall time: 0:00:30