[lammps-users] Errors on running ewald simulations on Ubuntu

Hi,

I am compiled LAMMPS on ubuntu, which seems to run fine in parallel for calculations which do not involve “kspace_style ewald”.

When using ewald kspace_style, I get the following error:
shandle is 7fffe3d2b1ac
shandle cookie is e0a1beaf
shandle at 80fff8
cookie = e0a1beaf
is_complete = 0
start = 7fa6d8a99010
bytes_as_contig = 283096
[0] MPI internal Aborting program Bad address in Rendezvous send (irecv-self)
[0] Bad address in Rendezvous send (irecv-self)
p0_21309: p4_error: : 1

I have fftw library installed and linked. The makefile looks like:
SHELL = /bin/sh
#.IGNORE:

System-specific settings

CC = /usr/bin/mpicxx
CCFLAGS = -g -O3 -DFFT_FFTW -DGZIP -static-libcxa
DEPFLAGS = -M -D
LINK = /usr/bin/mpicxx /usr/lib/libfftw.a
LINKFLAGS = -g -O3 -static-libcxa -lstdc++
USRLIB = -lfftw -lmpich
SYSLIB =
SIZE = size

Link rule

(EXE): (OBJ)
(LINK) (LINKFLAGS) (OBJ) (USRLIB) (SYSLIB) -o (EXE)
(SIZE) (EXE)

Compilation rules

.o:.cpp
(CC) (CCFLAGS) -c $<
#.d:.cpp

(CC) (CCFLAGS) (DEPFLAGS) < > [email protected]

Individual dependencies

DEPENDS = (OBJ:.o=.d) #include (DEPENDS)

Has anyone come across this issue earlier. It seems like the problem is in the compilations.

Thanks in advance.

Best regards,
Ravi

PS: In case, if there is already a thread on this issue - please refer me to that. Thanks!!

You don't need FFTW to run Ewald. Please post
an input script, data file that show the crash. For
as simple, small a problem as possible. Also indicate
how many procs you are running on.

Steve

Hi Steve,

After a few more trials, I am realizing that the problem might be related to memory.

I tried running two models: (i) box size 80 x 80 x ~25.75; (ii) box size 80 x 80 x ~51.5. These are the nanowire models, (ii) model being twice longer than (i) in z-direction. In x and y direction, there is vacuum allowing for modeling of free surfaces.

Model (i) runs fine on single as well as multiple (8) processors. However, with model (ii) which is twice as big as (i), I get the following log and error files. The error remains the same on single as well as multiple processors. A sample input file is also copied at the end.

Let me know, if some other information is needed to resolve this issue. To give some further insight, the model (ii) and even bigger models in z-direction were successfully run earlier on a different cluster system. We are in the process of transitioning to a new cluster and facing these issues.

I need your data file to run this test. Email it to me privately
(sjplimp at sandia.gov)
You really should be using PPPM for a large problem, not Ewald.

Steve

I have no problem running either of your problems (small or large)
on my box on one or many procs. They are slow however, which
is due to using Ewald instead of PPPM.

Your input script is out-dated which indicates you are using
an old version of LAMMPS. The first thing I would do
is download the current version and see if your problem
goes away.

Steve

I tried the latest version (7th Jul, 09) as well, but I get the same error. Smaller model runs fine, but the bigger one gives error.

I ran the most current fully-patched version, 1Sep09. If you still
have problems, you're going to have to instrument the code
and find where the bug/crash is occurring. It doesn't happen for me.

Steve

Hi Steve,

I noticed that the crashing of lammps is specific to the "kspace_style
ewald". I get the error message, which I mentioned in in my earlier
mails.

With pppm, the job runs fine.

Also, I noted that during compilation, I get the following waring
message: "warning: deprecated conversion from string constant to
âchar*â". But I am not sure if that is the cause of the problem, as it
should effect the pppm run as well.

Any suggestion on how to resolve the problem with ewald?

Thanks,
Ravi

no - the string constant warning is not an issue. I think you'll have
to debug this yourself, since I can't reproduce the problem - i.e.
put print statements in the code to see where it croaks.
'
Steve