I am currently running a simulation based on the streitz example, keeping the original potential settings unchanged while changing the ensemble to NVT. I previously conducted a bulk Al2O3 simulation with these settings, and it appeared to work without issues.
For my new simulation, I modified the initial configuration with a sphere placed between two slabs. However, I have encountered a problem: the log file output frequency is significantly slower compared to the bulk simulation, and in some cases, the output stops entirely after a certain step. In some cases, the log file is always empty even after a very long time run.
I have tried running the simulation on two different machines, one using the 24Dec20 version and the other using the 24Aug29 version of the software. The CPU utilization appears to be at 100% for the requested CPUs.
Could you please provide any suggestions on what might be causing this issue? I have uploaded the relevant files for your reference.
The system in the example input deck is much smaller than yours (2160 atoms versus 91200 atoms, and 18351 Angstrom^3 versus 1705376 Angstrom^3). Thus the computational cost must be larger, specifically, because neither Ewald summation not charge equilibration scale linearly with system size. So there is one reason for a significant slowdown.
When the simulation is stuck, you can attach a debugger and determine where it is stuck and get some stack traces.
Thanks for pointing out that my system is much larger so the output frequency is much slower. I tried a new simulation with a bulk of 90000 atoms. The output looks alright since it is at a fixed pace even though it is much slower.
It seems my previous simulation got stuck at some point due to the initial configuration. For the attached one, the log file has been stuck at this step for more than one day. Could you please kindly give me more guidance on how to attach a debugger? Should I follow the descriptions on this doc page? 11.4. Debugging crashes — LAMMPS documentation
This does not (yet) cover the case of attaching a debugger to a running program.
This question is far beyond the scope of this forum (it is about LAMMPS and not debuggers).
The details depend on which platform you are running on. Have you tried a web search?
Thanks very much for the reply! I searched for LAMMPS and debugger, but I guess due to my lack of experience, I haven’t yet been able to get a clear understanding of how to proceed.
However, I found that on one of the machines, if I change the command line option from -sf hybrid intel omp to -sf omp, the output frequency turns to be normal. After 4400 steps, there is a warning message at the bottom of the log file:
WARNING: H matrix size has been exceeded: m_fill=5095 H.m=5000
(../fix_qeq_slater.cpp:214)
The output to the log file stopped although the job for LAMMPS is still running. I also searched for this warning message. Inspired by a previous post, I think the problem is that some atoms’ neighbors change significantly during the simulation due to my initial configuration.
By the way, the option -sf hybrid intel omp works for a simple bulk simulation. But for my new system, no output is written to the log file.
I guess I would need to test the settings of fix qeq/slater or I should start with a potential that does not require charge equilibration. Could you please give me some further advice if it does not bother you too much?
How to use a debugger is completely independent from LAMMPS. You won’t find much information when using both keywords, since few of the LAMMPS developers do use a debugger (to use it effectively requires some skill and practice but most LAMMPS developers rather use added printf statements for debugging, which doesn’t help much in your case)…
I have added a short paragraph to the LAMMPS manual which is currently only available in the development branch, but will be included in the next feature release (end of this month or beginning of the next month). 11.4. Debugging crashes — LAMMPS documentation
I looked through the source code for this message and dug around a little bit.
This warning would happen when you have too many atoms migrating from one sub-domain to another. Currently, this number may not grow by more than 20%. There are two ways to work around this:
break your run down to multiple run statements, e.g. replace
run 5000
with
run 2500 post no
run 2500
change the SAFEZONE constant in fix_qeq.h from 1.2 to a larger value, e.g. 1.5 and recompile LAMMPS
Thanks. You are using a 4 year old LAMMPS version with a 7 year old version of the Intel compiler. That may not be the immediate reason for your troubles, but there are troubles ahead.
There have been multiple improvements and bugfixes to the INTEL package, some of which definitely require a more recent Intel compiler to compile LAMMPS correctly.
There also have been bugfixes and improvements for all of LAMMPS.
The best way to move forward would be to see that you can get a more recent GCC compiler installed (best version 9 or later) and skip using multi-threading and the intel/omp suffixes until you have an input that produces proper results matching the publications describing the potential parameters that you are using. Then you can re-try compiling with the intel compiler (but best with a more recent version, the latest versions are now free of charge) and compare results to the GCC based reference and also evaluate if there is a significant performance gain (I suspect not that much, because when I ran your input deck, over 60% of the time went to QEq and that is not multi-threaded, so Amdahl’s law stipulates that the performance gain from parallelization or vectorization is (much) less than a factor of 2).
What is a suitable potential is determined by the science of your problem. Normally, the step to determine this comes before setting up simulations. If you are not doing bulk systems, you are likely to require some variant of charge equilibration as for most oxides. For example, you could look into ReaxFF, but that uses the exact same QEq algorithm (in fact, you can substitute fix qeq/shielded for fix qeq/reaxff and get the same results) and thus will incur the same problems.
Thank you very much for the detailed instructions!
I tried running the same simulation on another machine using LAMMPS version 29Aug2024 with the -sf omp option in the command line. Unfortunately, it stopped writing to the log file at step 3100 without any warning messages. Since this log file didn’t provide any clues, I previously used the log file from a simulation run with an older version of LAMMPS.
It seems the problem still exists even with the newer version. For reference, I’ve attached the -h output from the 29Aug2024 version: lammps29Aug2024_help (27.1 KB)
I understand it is a better practice to use the latest version due to bug fixes and improvements. I will follow your detailed advice and work on troubleshooting this issue.