GPU accelerated LAMMPS running for a while then stop with Cuda driver error 4

Dear all,

I have LAMMPS 23June22 version installed with CUDA 12.2. My system has 2 RTX4090 D GPUs. I’m now testing it. The job starts normally but stopped after some time.

Here’s what I got from the output on my screen.

This is the start of the testing, it ran well and I got output normally
LAMMPS (23 Jun 2022 - Update 4)
Reading data file …
orthogonal box = (0.23373493 0.46746987 0.46746987) to (112.26627 224.53253 224.53253)
1 by 2 by 4 MPI processor grid
reading atoms …
75000 atoms
reading velocities …
75000 velocities
scanning bonds …
2 = max bonds/atom
scanning angles …
3 = max angles/atom
reading bonds …
89900 bonds
reading angles …
119709 angles
Finding 1-2 1-3 1-4 neighbors …
special bond factors lj: 0 0 1
special bond factors coul: 0 0 0
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
16 = max # of 1-4 neighbors
299418 = # of 1-3 neighbors before angle trim
231888 = # of 1-3 neighbors after angle trim
11 = max # of special neighbors
special bonds CPU = 0.009 seconds
read_data CPU = 0.613 seconds
WARNING: 1 of 100001 force values in table Mie11 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 100001 force values in table Mie12 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 100001 force values in table Mie13 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 100001 force values in table Mie14 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie15 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie22 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie23 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie24 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie25 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie33 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie34 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie35 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie44 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie45 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie55 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 4 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 5 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
Respa levels:
1 = bond angle dihedral improper
2 = pair kspace

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:

  • GPU package (short-range, long-range and three-body potentials):
    The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE


  • Using acceleration for table:
  • with 8 proc(s) per device.
  • Horizontal vector operations: ENABLED
  • Shared memory system: No

Device 0: NVIDIA GeForce RTX 4090 D, 114 CUs, 21/24 GB, 2.5 GHZ (Mixed Precision)

Initializing Device and compiling on process 0…Done.
Initializing Device 0 on core 0…Done.
Initializing Device 0 on core 1…Done.
Initializing Device 0 on core 2…Done.
Initializing Device 0 on core 3…Done.
Initializing Device 0 on core 4…Done.
Initializing Device 0 on core 5…Done.
Initializing Device 0 on core 6…Done.
Initializing Device 0 on core 7…Done.

Generated 0 of 10 mixed pair_coeff terms from geometric mixing rule
Neighbor list info …
update every 1 steps, delay 10 steps, check yes
max neighbors/atom: 10000, page size: 100000
master list distance cutoff = 27
ghost atom cutoff = 27
binsize = 13.5, bins = 9 17 17
0 neighbor lists, perpetual/occasional/extra = 0 0 0
Setting up r-RESPA run …
Unit style : real
Current step : 0
Time steps : 1:1 2:2
r-RESPA fixes :
Per MPI rank memory allocation (min/avg/max) = 27.95 | 27.97 | 27.98 Mbytes
Step S/CPU TotEng KinEng Temp PotEng Press Pxx Pyy Pzz Lx Ly Lz v_vol E_pair E_bond E_angle
0 0 103099.16 134134.66 600 -31035.504 69.186703 51.597323 86.724667 69.238119 112.03253 224.06506 224.06506 5624610.1 -173863.87 49799.22 93029.144
1000 98.869742 117910.03 134198.26 600.28448 -16288.233 30.598252 35.521746 6.0240824 50.248926 113.11225 226.2245 226.2245 5788804.8 -165780.81 53263.813 96228.765
2000 100.04439 120689.12 135212.94 604.82327 -14523.82 -40.476672 -17.910338 -97.37895 -6.1407291 113.4044 226.8088 226.8088 5833775.7 -164378.13 53295.415 96558.899
3000 100.45152 119367.34 133852.77 598.73904 -14485.423 -36.77761 65.347715 -19.481588 -156.19896 113.48877 226.97753 226.97753 5846805.2 -164235.61 53309.91 96440.275
4000 100.43875 119791.54 133837.11 598.66901 -14045.568 -15.333428 -17.679324 -18.881065 -9.4398963 113.5134 227.0268 227.0268 5850612.9 -163390.96 52743.473 96601.924
5000 100.70038 119534.69 134322.3 600.83933 -14787.608 -5.0721232 166.37413 -95.542675 -86.047821 113.47166 226.94332 226.94332 5844161.7 -163657.94 52860.162 96010.17
6000 100.75516 119991.76 134282.77 600.66249 -14291.01 15.408762 -102.48057 116.94916 31.75769 113.42441 226.84882 226.84882 5836864.2 -163939.2 53369.822 96278.367
7000 100.57404 120635.8 135014.48 603.93552 -14378.678 -1.0702177 58.024726 -77.728576 16.493197 113.38241 226.76481 226.76481 5830381.7 -164215.68 53298.449 96538.551
8000 99.876862 119314.36 133892.62 598.9173 -14578.253 -23.53561 -31.819799 -25.064684 -13.722346 113.34609 226.69219 226.69219 5824781.7 -164452.56 53212.756 96661.552
9000 99.95053 118831.28 133627.95 597.73341 -14796.666 97.543484 179.55166 -11.056771 124.13556 113.39106 226.78213 226.78213 5831717.4 -164016.97 52794.628 96425.679
10000 100.19048 119914.1 134349.86 600.96261 -14435.766 60.000153 60.100023 50.976382 68.924055 113.41545 226.8309 226.8309 5835480.9 -163975.37 52898.075 96641.53
11000 100.32136 120522.83 133919.1 599.03578 -13396.272 3.2576169 33.748348 5.8463559 -29.821854 113.48762 226.97523 226.97523 5846627.5 -163569.73 53172.408 97001.051
12000 100.83904 119899 133968.36 599.25612 -14069.362 -14.529335 -39.863776 -87.546096 83.821868 113.4731 226.94621 226.94621 5844384.8 -163807.87 53271.327 96467.176
13000 100.49985 120090.22 133714.6 598.12099 -13624.379 -64.003309 -132.42082 -110.41437 50.825258 113.55966 227.11932 227.11932 5857769.1 -163411.74 53135.2 96652.158
14000 100.65775 119974.24 133721.79 598.15318 -13747.552 -74.156506 -118.4284 -30.111687 -73.929436 113.59235 227.1847 227.1847 5862829.1 -163288.17 53228.938 96311.683
15000 100.32615 120984.38 134227.35 600.4146 -13242.968 28.943264 -33.244294 64.112616 55.96147 113.52621 227.05242 227.05242 5852594.3 -163382.41 53379.045 96760.396
16000 100.37395 120213.82 133778.01 598.40465 -13564.192 -45.64777 -69.986934 -51.252584 -15.703792 113.50689 227.01378 227.01378 5849606.3 -163589.29 53133.121 96891.98
17000 98.821782 120579.42 134049.78 599.6203 -13470.36 22.43909 49.718295 71.05652 -53.457547 113.53577 227.07154 227.07154 5854072.9 -163423.59 53083.525 96869.709
18000 98.40109 120576.79 134126.39 599.96297 -13549.598 -53.553762 -97.057168 28.161516 -91.765633 113.58958 227.17917 227.17917 5862401.1 -163385.38 53244.416 96591.369
19000 98.467337 120586.34 134156.33 600.09689 -13569.99 -45.687693 12.072926 -139.70248 -9.4335251 113.64575 227.29151 227.29151 5871101.9 -163250.78 53071.128 96609.659
20000 98.276619 120275.4 133849.19 598.72304 -13573.793 26.181265 120.25936 -37.40383 -4.3117344 113.48268 226.96535 226.96535 5845864.1 -163657.53 53345.882 96737.855
21000 98.532749 121096.36 134581.95 602.00077 -13485.592 -71.157306 -64.221545 -31.257043 -117.99333 113.65687 227.31374 227.31374 5872824.9 -162878.06 53019.581 96372.884
22000 98.855216 119441.97 133652.94 597.84518 -14210.969 12.479226 72.950185 -24.110458 -11.402049 113.44806 226.89611 226.89611 5840515.5 -163808 53020.928 96576.103
23000 98.099888 120204.32 134544.77 601.83445 -14340.451 -32.248942 -80.513661 -52.578273 36.345107 113.5434 227.0868 227.0868 5855253.1 -163482.51 52897.213 96244.843
24000 98.216608 119800.64 133607.38 597.64141 -13806.741 -25.937814 -56.494696 5.6028913 -26.921637 113.4893 226.97861 226.97861 5846888.2 -163628.88 53156.962 96665.177
25000 98.364895 120289.63 134460.25 601.45641 -14170.623 -18.669279 -31.735447 9.5580889 -33.83048 113.49623 226.99246 226.99246 5847958.7 -163793.07 53064.617 96557.826
ERROR: Non-numeric pressure - simulation unstable (…/fix_nh.cpp:1059)
Last command: run 100000
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[31755,1],2]
Exit code: 1

My system spec:

OS: Ubuntu22.04
NVIDIA DRIVER version: 535.171.04
CUDA version: 12.2

I ran the same simulation but with 28 cpu cores. It’s been 3 days and everything is still in good shape.

But for the case I show above, I used whatever 8 CPU cores and 1 GPU. or 1 CPU core and 1 GPU or 28 CPU cores and 1 GPU, or 4 CPU cores and 2 GPUs, I will get the same error.

Many thanks in advance for any guidance on this issue, Yunhan.

Hi @yunhan_zhang,

Please have a look at the format guidelines to make your post readable here.

As for your question, the error you are encountering,
ERROR: Non-numeric pressure - simulation unstable (…/fix_nh.cpp:1059)
has been discussed many times on the forum. It is likely that either your model or starting geometry (or both) are bad.

Hi Germain,

Thank you for your help! But if I perform the simulation only by using CPU cores, there is no error reported. That I don’t know why.

Yunhan

Sorry I didn’t see the last lines where you said you did run only using CPUs.

Well in other comments that can be made, you are using a very old version of LAMMPS so you might benefit from updating to a newer version.

You should also see if using different precision settings leads to similar results between GPUs and CPUs simulation. It will be difficult to provide more meaningful help without more information on your input files.

npt-xu3.in (4.4 KB)
Thank you Germain, I try LAMMPS 7Feb2024 version, here is what I gout from the output on my screen.
LAMMPS (7 Feb 2024 - Update 1)
Reading data file …
orthogonal box = (-0.52305534 -1.0461107 -1.0461107) to (113.02306 226.04611 226.04611)
1 by 2 by 4 MPI processor grid
reading atoms …
75000 atoms
reading velocities …
75000 velocities
scanning bonds …
2 = max bonds/atom
scanning angles …
3 = max angles/atom
reading bonds …
89900 bonds
reading angles …
119695 angles
Finding 1-2 1-3 1-4 neighbors …
special bond factors lj: 0 0 1
special bond factors coul: 0 0 0
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
16 = max # of 1-4 neighbors
299390 = # of 1-3 neighbors before angle trim
233740 = # of 1-3 neighbors after angle trim
11 = max # of special neighbors
special bonds CPU = 0.010 seconds
read_data CPU = 0.684 seconds
WARNING: 1 of 100001 force values in table Mie11 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 1 of 100001 force values in table Mie12 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 1 of 100001 force values in table Mie13 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 1 of 100001 force values in table Mie14 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie15 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie22 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie23 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie24 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie25 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie33 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie34 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie35 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie44 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie45 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 2 of 100001 force values in table Mie55 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:466)
WARNING: 1 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:378)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:378)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:378)
WARNING: 4 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:378)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:378)
WARNING: 5 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:378)

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:

  • GPU package (short-range, long-range and three-body potentials): doi:10.1016/j.cpc.2010.12.021, doi:10.1016/j.cpc.2011.10.012, doi:10.1016/j.cpc.2013.08.002, doi:10.1016/j.commatsci.2014.10.068, doi:10.1016/j.cpc.2016.10.020, doi:10.3233/APC200086
    The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Switching to ‘neigh_modify every 1 delay 0 check yes’ setting during minimization


  • Using acceleration for table:
  • with 8 proc(s) per device.
  • Horizontal vector operations: ENABLED
  • Shared memory system: No

Device 0: NVIDIA GeForce RTX 4090 D, 114 CUs, 21/24 GB, 2.5 GHZ (Mixed Precision)

Initializing Device and compiling on process 0…Done.
Initializing Device 0 on core 0…Done.
Initializing Device 0 on core 1…Done.
Initializing Device 0 on core 2…Done.
Initializing Device 0 on core 3…Done.
Initializing Device 0 on core 4…Done.
Initializing Device 0 on core 5…Done.
Initializing Device 0 on core 6…Done.
Initializing Device 0 on core 7…Done.

Generated 0 of 10 mixed pair_coeff terms from geometric mixing rule
Setting up cg style minimization …
Unit style : real
Current step : 0
Per MPI rank memory allocation (min/avg/max) = 27.51 | 27.53 | 27.55 Mbytes
Step Temp E_pair E_mol TotEng Press
0 596.59493 -164669.88 149476 118179.55 2.9539451
621 596.59493 -202110.58 71967.035 3229.8864 -1156.8257
Loop time of 4.85948 on 8 procs for 621 steps with 75000 atoms

95.9% CPU use with 8 MPI tasks x no OpenMP threads

Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
-15193.8816395319 -130143.439052459 -130143.548311836
Force two-norm initial, final = 7291.4047 169.50103
Force max component initial, final = 140.67467 86.086704
Final line search alpha, max atom move = 0.0013520998 0.11639781
Iterations, force evaluations = 621 980

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

Pair | 0.43966 | 0.998 | 1.6073 | 34.2 | 20.54
Bond | 1.2799 | 1.3544 | 1.5449 | 7.1 | 27.87
Neigh | 0.0067281 | 0.0071818 | 0.0076007 | 0.4 | 0.15
Comm | 0.4729 | 0.66041 | 0.75044 | 10.5 | 13.59
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 4.5412e-05 | 6.0047e-05 | 8.1381e-05 | 0.0 | 0.00
Other | | 1.839 | | | 37.85

Nlocal: 9375 ave 9438 max 9276 min
Histogram: 1 0 0 1 2 0 0 1 0 3
Nghost: 30405.6 ave 30608 max 30194 min
Histogram: 2 1 0 1 0 0 0 1 1 2
Neighs: 0 ave 0 max 0 min
Histogram: 8 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Ave special neighs/atom = 8.1988533
Neighbor list builds = 8
Dangerous builds = 0
Respa levels:
1 = bond angle dihedral improper
2 = pair kspace


  Device Time Info (average):

CPU Neighbor: 0.0045 s.
CPU Cast/Pack: 0.1116 s.
CPU Driver_Time: 0.0164 s.
CPU Idle_Time: 0.8485 s.
Average split: 1.0000.
Max Mem / Proc: 425.09 MB.
Prefetch mode: None.
Vector width: 32.
Lanes / atom: 4.
Pair block: 256.
Neigh block: 128.
Neigh mode: Hybrid (binning on host) with subgroup support


  • Using acceleration for table:
  • with 8 proc(s) per device.
  • Horizontal vector operations: ENABLED
  • Shared memory system: No

Device 0: NVIDIA GeForce RTX 4090 D, 114 CUs, 21/24 GB, 2.5 GHZ (Mixed Precision)

Initializing Device and compiling on process 0…Done.
Initializing Device 0 on core 0…Done.
Initializing Device 0 on core 1…Done.
Initializing Device 0 on core 2…Done.
Initializing Device 0 on core 3…Done.
Initializing Device 0 on core 4…Done.
Initializing Device 0 on core 5…Done.
Initializing Device 0 on core 6…Done.
Initializing Device 0 on core 7…Done.

Generated 0 of 10 mixed pair_coeff terms from geometric mixing rule
Setting up r-RESPA run …
Unit style : real
Current step : 621
Time steps : 1:1 2:2
r-RESPA fixes :
Per MPI rank memory allocation (min/avg/max) = 27.28 | 27.3 | 27.32 Mbytes
Step S/CPU TotEng KinEng Temp PotEng Press Pxx Pyy Pzz Lx Ly Lz v_vol E_pair E_bond E_angle
621 0 3991.1155 134134.66 600 -130143.55 -1150.8832 -1125.7267 -1166.7002 -1160.2227 113.54611 227.09222 227.09222 5855672.5 -202110.58 14424.919 57542.116
ERROR on proc 1: Bond length > table outer cutoff: type 6 length 124.26644 (…/bond_table.cpp:598)
Last command: run 60000
ERROR on proc 2: Bond length > table outer cutoff: type 4 length 81.710556 (…/bond_table.cpp:598)
Last command: run 60000
ERROR on proc 3: Bond length > table outer cutoff: type 5 length 14.36658 (…/bond_table.cpp:598)
Last command: run 60000
ERROR on proc 4: Bond length > table outer cutoff: type 6 length 12.123355 (…/bond_table.cpp:598)
Last command: run 60000
ERROR on proc 5: Bond length > table outer cutoff: type 2 length 131.84453 (…/bond_table.cpp:598)
Last command: run 60000
ERROR on proc 6: Bond length > table outer cutoff: type 8 length 153.1599 (…/bond_table.cpp:598)
Last command: run 60000
ERROR on proc 7: Bond length > table outer cutoff: type 5 length 14.527622 (…/bond_table.cpp:598)
Last command: run 60000
ERROR on proc 0: Bond length > table outer cutoff: type 6 length 141.64243 (…/bond_table.cpp:598)
Last command: run 60000

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

[node10:667003] 7 more processes have sent help message help-mpi-api.txt / mpi-abort
[node10:667003] Set MCA parameter “orte_base_help_aggregate” to 0 to see all help / error messages

It appears the bond is very very long.

The input files are attached, It looks a little messy. I have to upload the force files one by one because of the file size is big.
eq.zip (4.9 MB)
npt-xu3.in (5.3 KB)
PVB_bonded_potentials.zip (426.7 KB)
Mie_pot_PVA_S08124_PVB_S06874_11.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_12.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_13.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_14.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_15.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_22.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_23.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_24.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_25.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_33.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_34.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_35.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_44.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_45.table (6.5 MB)
Mie_pot_PVA_S08124_PVB_S06874_55.table (6.5 MB)

I linked you a post that teaches you how to format your posts on the forum.

Please consider reading it and make your posts readable with proper formatting if you want meaningful insights. It is currently very hard to read.

Your GPU library has been compiled for mixed precision floating point match while the CPU code only uses double precision. Using mixed precision floating point on GPUs is often resulting in a significant performance boost at the expense of some accuracy (but rather little since the summing is still done in double precision and this is where the higher precision is needed the most). Yet, with some single precision force computations is it possible to have overflows in high potential energy geometries.
It is thus recommended to either start from more realistic initial geometries or to do the initial relaxation on CPUs only, then write out a data file and start with GPUs from there.

This seems to be a very large cutoff for a molecular system. The fact that you increased neigh_modify one supports that. Also using the “delay 10” setting seems unrealistic, especially since you are also using r-RESPA. Please explain and justify these choices?

1 Like

This may be an issue similar to something I’ve encountered some time ago. See link below for the discussion and a solution: [BUG] Lost atoms when using Kokkos ReaxFF with RTX3090 · Issue #3348 · lammps/lammps · GitHub