I am compiled LAMMPS on ubuntu, which seems to run fine in parallel for calculations which do not involve “kspace_style ewald”.
When using ewald kspace_style, I get the following error: shandle is 7fffe3d2b1ac shandle cookie is e0a1beaf shandle at 80fff8 cookie = e0a1beaf is_complete = 0 start = 7fa6d8a99010 bytes_as_contig = 283096 [0] MPI internal Aborting program Bad address in Rendezvous send (irecv-self) [0] Bad address in Rendezvous send (irecv-self) p0_21309: p4_error: : 1
I have fftw library installed and linked. The makefile looks like:
SHELL = /bin/sh
#.IGNORE:
You don't need FFTW to run Ewald. Please post
an input script, data file that show the crash. For
as simple, small a problem as possible. Also indicate
how many procs you are running on.
After a few more trials, I am realizing that the problem might be related to memory.
I tried running two models: (i) box size 80 x 80 x ~25.75; (ii) box size 80 x 80 x ~51.5. These are the nanowire models, (ii) model being twice longer than (i) in z-direction. In x and y direction, there is vacuum allowing for modeling of free surfaces.
Model (i) runs fine on single as well as multiple (8) processors. However, with model (ii) which is twice as big as (i), I get the following log and error files. The error remains the same on single as well as multiple processors. A sample input file is also copied at the end.
Let me know, if some other information is needed to resolve this issue. To give some further insight, the model (ii) and even bigger models in z-direction were successfully run earlier on a different cluster system. We are in the process of transitioning to a new cluster and facing these issues.
I need your data file to run this test. Email it to me privately
(sjplimp at sandia.gov)
You really should be using PPPM for a large problem, not Ewald.
I have no problem running either of your problems (small or large)
on my box on one or many procs. They are slow however, which
is due to using Ewald instead of PPPM.
Your input script is out-dated which indicates you are using
an old version of LAMMPS. The first thing I would do
is download the current version and see if your problem
goes away.
I ran the most current fully-patched version, 1Sep09. If you still
have problems, you're going to have to instrument the code
and find where the bug/crash is occurring. It doesn't happen for me.
I noticed that the crashing of lammps is specific to the "kspace_style
ewald". I get the error message, which I mentioned in in my earlier
mails.
With pppm, the job runs fine.
Also, I noted that during compilation, I get the following waring
message: "warning: deprecated conversion from string constant to
âchar*â". But I am not sure if that is the cause of the problem, as it
should effect the pppm run as well.
Any suggestion on how to resolve the problem with ewald?
no - the string constant warning is not an issue. I think you'll have
to debug this yourself, since I can't reproduce the problem - i.e.
put print statements in the code to see where it croaks.
'
Steve