Dear all,
I have LAMMPS 23June22 version installed with CUDA 12.2. My system has 2 RTX4090 D GPUs. I’m now testing it. The job starts normally but stopped after some time.
Here’s what I got from the output on my screen.
This is the start of the testing, it ran well and I got output normally
LAMMPS (23 Jun 2022 - Update 4)
Reading data file …
orthogonal box = (0.23373493 0.46746987 0.46746987) to (112.26627 224.53253 224.53253)
1 by 2 by 4 MPI processor grid
reading atoms …
75000 atoms
reading velocities …
75000 velocities
scanning bonds …
2 = max bonds/atom
scanning angles …
3 = max angles/atom
reading bonds …
89900 bonds
reading angles …
119709 angles
Finding 1-2 1-3 1-4 neighbors …
special bond factors lj: 0 0 1
special bond factors coul: 0 0 0
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
16 = max # of 1-4 neighbors
299418 = # of 1-3 neighbors before angle trim
231888 = # of 1-3 neighbors after angle trim
11 = max # of special neighbors
special bonds CPU = 0.009 seconds
read_data CPU = 0.613 seconds
WARNING: 1 of 100001 force values in table Mie11 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 100001 force values in table Mie12 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 100001 force values in table Mie13 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 100001 force values in table Mie14 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie15 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie22 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie23 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie24 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie25 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie33 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie34 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie35 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie44 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie45 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 2 of 100001 force values in table Mie55 are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/pair_table.cpp:463)
WARNING: 1 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 4 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 2 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
WARNING: 5 of 1001 force values in table are inconsistent with -dE/dr.
WARNING: Should only be flagged at inflection points (…/bond_table.cpp:380)
Respa levels:
1 = bond angle dihedral improper
2 = pair kspace
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
- GPU package (short-range, long-range and three-body potentials):
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
- Using acceleration for table:
- with 8 proc(s) per device.
- Horizontal vector operations: ENABLED
- Shared memory system: No
Device 0: NVIDIA GeForce RTX 4090 D, 114 CUs, 21/24 GB, 2.5 GHZ (Mixed Precision)
Initializing Device and compiling on process 0…Done.
Initializing Device 0 on core 0…Done.
Initializing Device 0 on core 1…Done.
Initializing Device 0 on core 2…Done.
Initializing Device 0 on core 3…Done.
Initializing Device 0 on core 4…Done.
Initializing Device 0 on core 5…Done.
Initializing Device 0 on core 6…Done.
Initializing Device 0 on core 7…Done.
Generated 0 of 10 mixed pair_coeff terms from geometric mixing rule
Neighbor list info …
update every 1 steps, delay 10 steps, check yes
max neighbors/atom: 10000, page size: 100000
master list distance cutoff = 27
ghost atom cutoff = 27
binsize = 13.5, bins = 9 17 17
0 neighbor lists, perpetual/occasional/extra = 0 0 0
Setting up r-RESPA run …
Unit style : real
Current step : 0
Time steps : 1:1 2:2
r-RESPA fixes :
Per MPI rank memory allocation (min/avg/max) = 27.95 | 27.97 | 27.98 Mbytes
Step S/CPU TotEng KinEng Temp PotEng Press Pxx Pyy Pzz Lx Ly Lz v_vol E_pair E_bond E_angle
0 0 103099.16 134134.66 600 -31035.504 69.186703 51.597323 86.724667 69.238119 112.03253 224.06506 224.06506 5624610.1 -173863.87 49799.22 93029.144
1000 98.869742 117910.03 134198.26 600.28448 -16288.233 30.598252 35.521746 6.0240824 50.248926 113.11225 226.2245 226.2245 5788804.8 -165780.81 53263.813 96228.765
2000 100.04439 120689.12 135212.94 604.82327 -14523.82 -40.476672 -17.910338 -97.37895 -6.1407291 113.4044 226.8088 226.8088 5833775.7 -164378.13 53295.415 96558.899
3000 100.45152 119367.34 133852.77 598.73904 -14485.423 -36.77761 65.347715 -19.481588 -156.19896 113.48877 226.97753 226.97753 5846805.2 -164235.61 53309.91 96440.275
4000 100.43875 119791.54 133837.11 598.66901 -14045.568 -15.333428 -17.679324 -18.881065 -9.4398963 113.5134 227.0268 227.0268 5850612.9 -163390.96 52743.473 96601.924
5000 100.70038 119534.69 134322.3 600.83933 -14787.608 -5.0721232 166.37413 -95.542675 -86.047821 113.47166 226.94332 226.94332 5844161.7 -163657.94 52860.162 96010.17
6000 100.75516 119991.76 134282.77 600.66249 -14291.01 15.408762 -102.48057 116.94916 31.75769 113.42441 226.84882 226.84882 5836864.2 -163939.2 53369.822 96278.367
7000 100.57404 120635.8 135014.48 603.93552 -14378.678 -1.0702177 58.024726 -77.728576 16.493197 113.38241 226.76481 226.76481 5830381.7 -164215.68 53298.449 96538.551
8000 99.876862 119314.36 133892.62 598.9173 -14578.253 -23.53561 -31.819799 -25.064684 -13.722346 113.34609 226.69219 226.69219 5824781.7 -164452.56 53212.756 96661.552
9000 99.95053 118831.28 133627.95 597.73341 -14796.666 97.543484 179.55166 -11.056771 124.13556 113.39106 226.78213 226.78213 5831717.4 -164016.97 52794.628 96425.679
10000 100.19048 119914.1 134349.86 600.96261 -14435.766 60.000153 60.100023 50.976382 68.924055 113.41545 226.8309 226.8309 5835480.9 -163975.37 52898.075 96641.53
11000 100.32136 120522.83 133919.1 599.03578 -13396.272 3.2576169 33.748348 5.8463559 -29.821854 113.48762 226.97523 226.97523 5846627.5 -163569.73 53172.408 97001.051
12000 100.83904 119899 133968.36 599.25612 -14069.362 -14.529335 -39.863776 -87.546096 83.821868 113.4731 226.94621 226.94621 5844384.8 -163807.87 53271.327 96467.176
13000 100.49985 120090.22 133714.6 598.12099 -13624.379 -64.003309 -132.42082 -110.41437 50.825258 113.55966 227.11932 227.11932 5857769.1 -163411.74 53135.2 96652.158
14000 100.65775 119974.24 133721.79 598.15318 -13747.552 -74.156506 -118.4284 -30.111687 -73.929436 113.59235 227.1847 227.1847 5862829.1 -163288.17 53228.938 96311.683
15000 100.32615 120984.38 134227.35 600.4146 -13242.968 28.943264 -33.244294 64.112616 55.96147 113.52621 227.05242 227.05242 5852594.3 -163382.41 53379.045 96760.396
16000 100.37395 120213.82 133778.01 598.40465 -13564.192 -45.64777 -69.986934 -51.252584 -15.703792 113.50689 227.01378 227.01378 5849606.3 -163589.29 53133.121 96891.98
17000 98.821782 120579.42 134049.78 599.6203 -13470.36 22.43909 49.718295 71.05652 -53.457547 113.53577 227.07154 227.07154 5854072.9 -163423.59 53083.525 96869.709
18000 98.40109 120576.79 134126.39 599.96297 -13549.598 -53.553762 -97.057168 28.161516 -91.765633 113.58958 227.17917 227.17917 5862401.1 -163385.38 53244.416 96591.369
19000 98.467337 120586.34 134156.33 600.09689 -13569.99 -45.687693 12.072926 -139.70248 -9.4335251 113.64575 227.29151 227.29151 5871101.9 -163250.78 53071.128 96609.659
20000 98.276619 120275.4 133849.19 598.72304 -13573.793 26.181265 120.25936 -37.40383 -4.3117344 113.48268 226.96535 226.96535 5845864.1 -163657.53 53345.882 96737.855
21000 98.532749 121096.36 134581.95 602.00077 -13485.592 -71.157306 -64.221545 -31.257043 -117.99333 113.65687 227.31374 227.31374 5872824.9 -162878.06 53019.581 96372.884
22000 98.855216 119441.97 133652.94 597.84518 -14210.969 12.479226 72.950185 -24.110458 -11.402049 113.44806 226.89611 226.89611 5840515.5 -163808 53020.928 96576.103
23000 98.099888 120204.32 134544.77 601.83445 -14340.451 -32.248942 -80.513661 -52.578273 36.345107 113.5434 227.0868 227.0868 5855253.1 -163482.51 52897.213 96244.843
24000 98.216608 119800.64 133607.38 597.64141 -13806.741 -25.937814 -56.494696 5.6028913 -26.921637 113.4893 226.97861 226.97861 5846888.2 -163628.88 53156.962 96665.177
25000 98.364895 120289.63 134460.25 601.45641 -14170.623 -18.669279 -31.735447 9.5580889 -33.83048 113.49623 226.99246 226.99246 5847958.7 -163793.07 53064.617 96557.826
ERROR: Non-numeric pressure - simulation unstable (…/fix_nh.cpp:1059)
Last command: run 100000
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 98.
Cuda driver error 4 in call at file ‘geryon/nvd_timer.h’ in line 99.
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[31755,1],2]
Exit code: 1
My system spec:
OS: Ubuntu22.04
NVIDIA DRIVER version: 535.171.04
CUDA version: 12.2
I ran the same simulation but with 28 cpu cores. It’s been 3 days and everything is still in good shape.
But for the case I show above, I used whatever 8 CPU cores and 1 GPU. or 1 CPU core and 1 GPU or 28 CPU cores and 1 GPU, or 4 CPU cores and 2 GPUs, I will get the same error.
Many thanks in advance for any guidance on this issue, Yunhan.