Dear all,
I have LAMMPS (29 Aug 2024 - Update 1) version installed with CUDA 12.2. My system has 1 RTX A6000 GPU. I’m now testing it. The job starts normally but stopped after some time
I am now testing the wetting process of water molecule droplets on metal surfaces. When the simulation system is small (Lx * Ly * Lz=30d * 30d * 60d, d=3.61 Å), the program can run normally. But when I enlarge the simulation system (LxLyLz=50d50d300d, d=3.61 Å), the program starts normally but will terminate abnormally after some time and instead of feeding back the common Error message, it will display the following message: lmp_mpi: geryon/ucl_d_vec.h:350: int ucl_ cudadr::UCL_D_Vec::resize(int) [with numtyp = int]: Assertion `_kind!=UCL_VIEW’ failed.
A strange phenomenon is that when I adjust the number of CPUs I’m using, the time at which the error occurs changes. By chance, when I changed the run command from “mpirun -np 20 lmp_mpi -sf gpu -pk gpu 1 -in in.runshi” to “mpirun -np 1 lmp_mpi -sf gpu -pk gpu 1 -in in.runshi”, i.e., to use only one CPU, the program ran fine. But it runs too slowly, which is obviously unacceptable for a larger system.
I carefully searched the forums for related problems and solutions, but did not find the same error message. I’m not sure if this is a bug in the way the software runs or a problem with my simulation system. So I will submit two zip files including in files, log files, and error messages for two different sized systems and look forward to your attention.
Since this is the first time I ask a question here, I am not sure if the information I have provided is detailed enough. If you need more information for troubleshooting, please let me know!
Many thanks in advance for any guidance on this issue
Deyang
Large_system.rar (17.7 KB)
Small_system.rar (17.5 KB)