I need the EXTRA-MOLECULE package to do fourier type dihedrals. To run intel acceleration, I included the PKG_USER-INTEL.
Now, whenever I try to run a intel accelerated simulation, using the following command (lmp_user_intel is my lammps executable): lmp _user_intel -sf intel -in in.file
I get the following seg fault:
With the “cut long” settings for the pair style, you have exactly the same potential as with the lj/cut/coul/long pair style. The long-range dispersion also requires a different kspace style. When I switch the pair style, your input can run on my machine. This doesn’t fix the bug, but avoids it. It will still need to be fixed, but you could run your calculation.
Actually, I might have spoken too soon. The simulation seems to run fine without -sf intel, but I run the simulation with -sf intel, it crashes out at the NPT stage.
The following is the stdoutput and stderr right after the NVT step:
Performance: 48.402 ns/day, 0.496 hours/ns, 560.205 timesteps/s
99.2% CPU use with 96 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 6.8871 | 7.4039 | 8.0429 | 9.6 | 41.48
Bond | 0.0049542 | 0.027534 | 0.19299 | 27.2 | 0.15
Kspace | 4.5158 | 5.3119 | 5.7596 | 11.5 | 29.76
Neigh | 0.41938 | 0.4221 | 0.42559 | 0.2 | 2.36
Comm | 1.8177 | 1.8761 | 1.9599 | 2.0 | 10.51
Output | 0.0049378 | 0.0050321 | 0.0050376 | 0.0 | 0.03
Modify | 2.5665 | 2.634 | 2.6744 | 1.1 | 14.76
Other | | 0.1701 | | | 0.95
Nlocal: 287.740 ave 308 max 270 min
Histogram: 3 5 6 22 16 25 9 9 0 1
Nghost: 6321.50 ave 6383 max 6272 min
Histogram: 2 8 13 25 15 11 12 5 2 3
Neighs: 133070.0 ave 149678 max 120789 min
Histogram: 9 6 8 24 12 19 9 5 2 2
Total # of neighbors = 12774687
Ave neighs/atom = 462.46559
Ave special neighs/atom = 2.1904210
Neighbor list builds = 520
Dangerous builds = 0
Ran NVT step!
Finding SHAKE clusters ...
0 = # of size 2 clusters
0 = # of size 3 clusters
0 = # of size 4 clusters
9017 = # of frozen angles
find clusters CPU = 0.028 seconds
About to kick off npt...
PPPM initialization ...
using 12-bit tables for long-range coulomb (src/kspace.cpp:340)
G vector (1/distance) = 0.3070986
grid = 40 40 40
stencil order = 7
estimated absolute RMS force accuracy = 0.0028115308
estimated relative force accuracy = 8.4668416e-06
using single precision MKL FFT
3d grid and FFT values/proc = 5415 800
----------------------------------------------------------
Using Intel Package without Coprocessor.
Precision: mixed
----------------------------------------------------------
Setting up Verlet run ...
Unit style : real
Current step : 10554
Time step : 1
ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1069)
Last command: run 10000
srun: error: stellar-i02n9: tasks 0-95: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=118399.0
The computation is being run on 96 cores.
I made the minimization more stringent and allowed for a longer NVT simulation, but it does not seem to help with the system breaking down when NPT is kickstarted. Again, ironically, without -sf intel, things seem to go along fine.
Is there another piece of information that I am missing?
When using -sf intel you are using a different code path which uses cached copies of data to make then properly aligned for vectorization. These transformations are not always well debugged for the case of multi-step runs. There have been several bug reports about similar issues in the past and - when suitably documented - those bugs will eventually be fixed. Please keep in mind that the INTEL package is contributed code and some of the maintainers are rather busy while others have moved on, so bugfixes can take time until they are resolved and fed back into the LAMMPS distribution.
My suggestion for a workaround is to write out data files (or restarts) and split the multi-step run into multiple runs with explicit input files where the state from the previous run is read from a file.
Hello @akohlmey, sorry for not responding sooner! I ran your first suggestion of creating a different run for the NPT step after running a minimization and NVT equilibration. It worked, thanks a ton.
I will use the version above and inform you. Thank you again sir!
I have also identify the reason for the segfault when using pair style lj/long/coul/long.
It turns out to be a wrapper class without any vectorization or other optimization. So using lj/cut/coul/long should give better performance. The segfault stems from the fact that the other INTEL package styles require the pair style to set up some special buffers in a rather indirect way, but this wrapper was not doing that.
@akohlmey, I ran some more tests. I was observing a lot of inconsistent behavior with -sf intel even after splitting up the runs into energy minimization+nvt and npt. Inconsistent in the sense that it would run sometimes, and wont run other times with NO changes made.
But your fix of suffix off ... suffix on resolved that issue as well. Thank you again.
The corresponding changes have been committed to the development branch yesterday.
So if you download the snapshot from https://github.com/lammps/lammps/archive/refs/heads/develop.tar.gz
and compile a new executable, it should work with the original input (but lj/cut/coul/long should be faster as noted before).