[lammps-users] MPI-related execution error for lammps-29Oct20 on MacOSX using macports

post to lammps-user forum 28 Jun 2021

I am encounting an execution error when I try to run lammps_mpi. When I build and run in serial mode only the code runs fine.
More details below. Please advise.

Unfortunately, there is little that can be done from remote without having either the input deck to repeat the calculation or having a stack trace indicating where the failure happens.
Please see https://docs.lammps.org/Errors_debug.html for some suggestions on the latter.

There are three possible causes of error: 1.) some MPI setup/configuration problem (e.g. insufficient access to buffer memory or incorrectly configured communication devices) 2) some bug in LAMMPS that is only exposed by running with multiple MPI processes 3) some system setup issue cause (too) large changes to the domain decomposition setup like initial box dimensions with shrinkwrap boundary conditions too far away from the shrinkwrapped state.

It works for me with the current development version.

overall this is code that has been in use in LAMMPS for a long time and has seen very few changes and is also included in unit testing since last summer.
This makes it more likely that what you are seeing is an issue with your MPI library. Have you tried compiling and running some MPI demo or tutorial examples?

axel.

FYI,

I also compiled the stable version and ran the same input deck with valgrind on my Linux box and it comes out clean as well (this is with 4 MPI tasks so valgrind is run 4 times)

Axel.

==108347== Memcheck, a memory error detector
==108347== Copyright (C) 2002-2017, and GNU GPL’d, by Julian Seward et al.
==108347== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==108347== Command: ./lmp -in in.lattice
==108347==
==108349== Memcheck, a memory error detector
==108349== Copyright (C) 2002-2017, and GNU GPL’d, by Julian Seward et al.
==108349== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==108349== Command: ./lmp -in in.lattice
==108349==
==108348== Memcheck, a memory error detector
==108348== Copyright (C) 2002-2017, and GNU GPL’d, by Julian Seward et al.
==108348== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==108348== Command: ./lmp -in in.lattice
==108348==
==108346== Memcheck, a memory error detector
==108346== Copyright (C) 2002-2017, and GNU GPL’d, by Julian Seward et al.
==108346== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==108346== Command: ./lmp -in in.lattice
==108346==
LAMMPS (29 Oct 2020)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:94)
using 1 OpenMP thread(s) per MPI task

***** base parameters *****
a0 = 3.54
Nx = 12
Ny = 5
Nz = 8

Thank you, again. I suspect there is something wrong with my installation of MPI (from MacPorts).
I tried both gcc9 and gcc10 (with the mpi built with each). I understand that macports had some problems with MPI
when the latest MacOSX came out but I thought they had resolved those.

Would you suggest an alternative to using macports? Build mpich or openmpi myself? Using the apple gcc?
etc.

I just tried indent from the examples. Got this error:

[[email protected]] HYDU_getcwd (utils/args/args.c:234): allocated space is too small for absolute path

which again makes me suspect a problem with mpi.

Thank you for the quick response and the suggestion on reading about debugging.
I can’t believe I left off the input deck. Here it is. Perhaps something here will be evident to you.

variable a0 equal 3.54 # Ni (U3)
variable Nx equal 12 # number of periods along tilt axis
variable Ny equal 5 # number of periods along y (perp to GBs)
variable Nz equal 8 # number of periods along z (parallel to GBs)

print base parameters

print " "
print " ***** base parameters ***** "
print " a0 = {a0} " print " Nx = {Nx} "
print " Ny = {Ny} " print " Nz = {Nz} "
print " *************************** "

derived

variable rnn equal {a0}/sqrt(2.) # nn distance at this lattice constant variable Px equal {a0} # periodicity along x (tilt axis)
variable Lx equal {Nx}*{Px} # box length along x (periodic)
variable Py equal sqrt(5.){a0} # periodicity along y (perp to GB) variable Ly equal {Ny}{Py} # box length along y (periodic) includes two grains variable Pz equal sqrt(5.)*{a0} # periodicity along z (parallel to GB)
variable Lz equal {Nz}*{Pz} # box length along z (periodic)

print box size

print " "
print " Lx,Ly,Lz = {Lx}, {Ly}, ${Lz} "
print " "

regions (left and right w.r.t. y-axis which is perp to GB)

left region: (-Lx/2,Lx/2)(-Ly/2,0)(-Lz/2,Lz/2)

right region: (-Lx/2,+Lx/2)(0,Ly/2)(-Lz/2,Lz/2)

yes, this indicates two things:
a) you are using MPICH
b) some path name exceeds a compiled in default (could be the path to the installation or the input file of the example you want to run)

you could check on b) by copying the executable and the input to /tmp and see if the issue repeats when launching from there.
but that is not something that should cause a bus error, so this may be a different issue.

but you could see whether a) can be avoided and try to install OpenMPI instead.

I don’t use macOS machines much (they confuse me as Apple engineers continue to make it more idiosyncratic and different from its BSD-like base), but the one machine that I am using for testing LAMMPS has OpenMPI 3.1.1 installed and that has not given me grief (just the usual OpenMPI valgrind issues where the OpenMPI code is too smart for valgrind).

axel.

Using macports I installed openmpi (gcc10 version) and now I can build and run lmp_mpi fine.

Thanks for your help.