LAMMPS run error

Cute · June 30, 2023, 2:05pm

Dear lammps developers，

I am using the university’s mainframe computer in parallel to run my lammps model, but can’t get it started anyway.
There are a total of 700,000 particles in the model. I am simulating the same model with 3000 particles on my laptop with a single core and it runs successfully and without any errors.
I don’t know how to fix this error.
I’ll be grateful for any suggestions.

My code and the file with the error are attached below.

The version of lammps is 3 Mar2020 Python 3.7.4 kokkos

Sincerely
Cute
tt.lmp (6.9 KB)
tt_lammps-job_1637443.err (6.9 KB)

srtee · June 30, 2023, 9:01pm

There are several errors about failing to set up OpenFabric, which often occurs if LAMMPS (or another program) was not correctly compiled against the MPI that has been set up to work with the cluster’s particular configuration.

Please check that you have compiled LAMMPS successfully, and check if it runs any script (such as the ones in the examples folder). Get advice from your cluster admins about how to compile applications against their setup.

Cute · July 6, 2023, 4:29pm

Thank you for your reply!
After talking to the cluster administrator, the cluster administrator thinks there is something wrong with my code.
After checking the code, I ran the code for the model with 3000 particles using a single thread and the code worked fine. But when I go to run the same model with multiple threads, this error appears after 12 seconds of running. “ERROR on proc 0: Non-numeric atom coords - simulation unstable (src/OPENMP/domain_omp.cpp:58)”.
Since the single thread can run successfully, does that mean the code compiles correctly. Or is there a specific code that needs to be compiled in the script for multi-threaded runs?
The script and error message are attached.
test2_lammps-job_1678070.out (7.2 KB)
test.lmp (6.7 KB)

akohlmey · July 6, 2023, 6:39pm

You are confusing threads with MPI processes here.

Your boundary conditions of “f f f” make little sense. “p p f” or “p p m” would make more sense.

For computational efficiency, you should add: processors * * 1 before creating the box.
This way you won’t have subdomains without atoms and thus wasted CPU time.

The error you get usually happens with you have over or underflow. Since we had multiple bugfixes to the GRANULAR package, you probably should be using the latest feature release (15 Jun 2023) or LAMMPS instead of the stable version. It can also be facilitated by overly aggressive compiler optimization, e.g. certain versions of the Intel compilers are known to miscompile parts of LAMMPS.

If this error happens or not with a different number of MPI processes, it is most likely happening on a processor that has no (more) atoms. The suggested processor command should reduce the probability of that.

Cute · July 7, 2023, 6:36pm

Thank you so much, it’s been solved!