compilation problem

Hello,

I am trying to compile LAMMPS on an XE6 machine. I am facing problem in the final step at linking:

reax_SYSLIB = -lifcore -lsvml -lompstub -limf
reax_SYSPATH = -L/opt/intel/lib

I have the above two in the Makefile. If I keep them, then the error message is:

/usr/bin/ld: cannot find -lifcore
make[1]: *** […/lmp_chugach] Error 2
make[1]: Leaving directory `/work/small/home/u2/wes/vranjan/src/lammps-24Jun11/src/Obj_chugach’
make: *** [chugach] Error 2

If I remove them then I get the following complaint:

fix_move.o: In function LAMMPS_NS::FixMove::initial_integrate( (int))': ./fix_move.cpp:565: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc’
./fix_move.cpp:565: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc' ./fix_move.cpp:569: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc’
./fix_move.cpp:569: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc' ./fix_move.cpp:608: undefined reference to addstep_compute__Q2_9LAMMPS_NS6ModifyFL’

for several of the routines. I modified the Makefile from Makefile.jaguar.

Thanks,

Vivek

Hello,
I am trying to compile LAMMPS on an XE6 machine. I am facing problem in the
final step at linking:
reax_SYSLIB = -lifcore -lsvml -lompstub -limf
reax_SYSPATH = -L/opt/intel/lib
I have the above two in the Makefile. If I keep them, then the error message
is:
/usr/bin/ld: cannot find -lifcore
make[1]: *** [../lmp_chugach] Error 2
make[1]: Leaving directory
`/work/small/home/u2/wes/vranjan/src/lammps-24Jun11/src/Obj_chugach'
make: *** [chugach] Error 2
If I remove them then I get the following complaint:
fix_move.o: In function `LAMMPS_NS::FixMove::initial_integrate( (int))':
./fix_move.cpp:565: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
./fix_move.cpp:565: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
./fix_move.cpp:569: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
./fix_move.cpp:569: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
./fix_move.cpp:608: undefined reference to
`addstep_compute__Q2_9LAMMPS_NS6ModifyFL'
for several of the routines. I modified the Makefile from Makefile.jaguar.

this looks like you compiled the code with inconsistent compilers.

try: make clean-all
and then compile again.

axel.

Actually, I realized that the compilation error has nothing to do with reax_SYSLIB and reax_SYSPATH.

angle_charmm.o: In function LAMMPS_NS::AngleCharmm::coeff( (int, char **))': /src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc’
/src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc' /src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc’
/src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc' /src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to smalloc__Q2_9LAMMPS_NS6MemoryFLPCc’
angle_charmm.o:/work/small/home/u2/wes/vranjan/src/lammps-24Jun11/src/Obj_chugach/./angle_charmm.cpp:247: more undefined references to `smalloc__Q2_9LAMMPS_NS6MemoryFLPCc’ follow

I provided the correct pathname and still the above error is produced.

I am being consistent with compilation. I did use clean-all and

(i) successfully created reax library with ftn (PGI compiler)
(ii) make yes-reax
(iii) make Make.chugach

In the following I have copied my Makefile for chugach (the XE6 system I trying to compile on):

Actually, I realized that the compilation error has nothing to do
with reax_SYSLIB and reax_SYSPATH.

that is what i assumed.

angle_charmm.o: In function `LAMMPS_NS::AngleCharmm::coeff( (int, char
**))':
/src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
/src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
/src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
/src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
/src/Obj_chugach/./angle_charmm.cpp:197: undefined reference to
`smalloc__Q2_9LAMMPS_NS6MemoryFLPCc'
angle_charmm.o:/work/small/home/u2/wes/vranjan/src/lammps-24Jun11/src/Obj_chugach/./angle_charmm.cpp:247:
more undefined references to `smalloc__Q2_9LAMMPS_NS6MemoryFLPCc' follow
I provided the correct pathname and still the above error is produced.
I am being consistent with compilation. I did use clean-all and
(i) successfully created reax library with ftn (PGI compiler)

can you try a different compiler environment?

i am attaching a (better?) makefile template form kraken (works on jaguar, too)
that i have been successfully using with gcc (and faster execution than PGI).

the PGI compilers are pretty bad in terms of c++ standard compliance.
are you sure there were no serious warnings when compiling the rest?
what version of PGI are you using anyway?

cheers,
    axel.

Makefile.kraken (3.12 KB)

The same problem with Makefile.kraken. I changed the following:

CCFLAGS = -O3 -march=native -mpc64 -fno-exceptions -fno-rtti -ffast-math
-fstrict-aliasing -fomit-frame-pointer -finline-functions -Wall -Wno-unused

to:

CCFLAGS = -O3

as it complained that those other switches it could not recognize. I also changed the FFT flags to:

FFT_INC = -DFFT_NONE
FFT_PATH =
FFT_LIB =

as in the earlier flag it complained about “FFT_INC = -DFFT_FFTW -I$(FFTW_INC)” and asked for something after I expected.

As compiler version:

chugach1 % CC --version
/opt/cray/xt-asyncpe/4.6/bin/CC: INFO: Compiling with XTPE_COMPILE_TARGET=native.

pgCC 10.9-0 64-bit target on x86-64 Linux -tp shanghai-64
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2010, STMicroelectronics, Inc. All Rights Reserved.

Thanks,

Vivek

it asked you to switch the programming environment from PGI to GCC
the make file header shows how to do that and also how to load fftw.

if you don't follow my suggestions, i cannot help you.

axel.

Thanks Axel ! I did not pay attention earlier. I have been able to compile now with your help.

Thank you,

Vivek

Hi !

I successfully compiled the code. But when I run the job, I get the following error:

_pmii_daemon(SIGCHLD): PE 20 exit signal Segmentation fault
the same message for all the PE:0-31. (cores).

I did some search on the internet, and put “setenv PMI_QUIET 1” in the job script before calling lmp_kraken. It does not help.

I have a very small system with only 464 atoms. I am running the job on 2 nodes with 64 GB of memory.

Thanks,

Vivek

Hi !
I successfully compiled the code. But when I run the job, I get the
following error:
_pmii_daemon(SIGCHLD): PE 20 exit signal Segmentation fault
the same message for all the PE:0-31. (cores).
I did some search on the internet, and put "setenv PMI_QUIET 1" in the job
script before calling lmp_kraken. It does not help.

i am not surprised, you need to figure out what is causing the
segmentation fault that is leading up to this message.

I have a very small system with only 464 atoms. I am running the job on 2
nodes with 64 GB of memory.

what a waste...

since i still didn't get my crystal ball back from the
repair shop, there is little that i (or others) can do about this.

axel.

Hi !
I successfully compiled the code. But when I run the job, I get the
following error:
_pmii_daemon(SIGCHLD): PE 20 exit signal Segmentation fault
the same message for all the PE:0-31. (cores).
I did some search on the internet, and put “setenv PMI_QUIET 1” in the job
script before calling lmp_kraken. It does not help.

i am not surprised, you need to figure out what is causing the
segmentation fault that is leading up to this message.

Why you are not surprised ?

because the message that you try to avoid
is not an error, but rather telling you that
that had been an error (a segmentation fault)
before by its child process (apparently lammps).

all you are doing is to silence the message,
not remove the error.

this is like telling somebody to take tylenol
when he has an axe stuck in his head. :wink:

axel.

Hi !
I successfully compiled the code. But when I run the job, I get the
following error:
_pmii_daemon(SIGCHLD): PE 20 exit signal Segmentation fault
the same message for all the PE:0-31. (cores).
I did some search on the internet, and put “setenv PMI_QUIET 1” in the
job
script before calling lmp_kraken. It does not help.

i am not surprised, you need to figure out what is causing the
segmentation fault that is leading up to this message.

Why you are not surprised ?

because the message that you try to avoid
is not an error, but rather telling you that
that had been an error (a segmentation fault)
before by its child process (apparently lammps).

all you are doing is to silence the message,
not remove the error.

this is like telling somebody to take tylenol
when he has an axe stuck in his head. :wink:

Well, my bad… the actual original message was:

_pmii_daemon (SIGCHLD): PE 0 exit signal Killed

Then I did some search and found out a presentation from CRAY MPI people that said…

“To quiet the PMI daemon, use: export PMI_QUIET=1”

This resulted in the segmentation fault error message.

The document is:

http://www.nccs.gov/wp-content/training/2009_crayxt_workshop/apr14/KimMcMahon.pdf

and it is mentioned on page 15.

"To quiet the PMI daemon, use: export PMI_QUIET=1"
This resulted in the segmentation fault error message.

this makes no sense. you have some other problem.

but since you refuse to address this systematically,
i have no more interest to respond.

have a nice day.

axel.

Hi Axel,

I understand your reaction. How do you suggest I address it systematically ? Can you please give me some hints ?

Thanks,

Vivek

Hi Axel,
I understand your reaction. How do you suggest I address it systematically ?
Can you please give me some hints ?

use some common sense! it is always puzzling to me, that as
soon as computers and software get involved even some of the
most analytically thinking people suddenly revert to knee-jerk
reactions, panic and chaos. you are dealing with a machine and
software and those don't have a mind of their own, even if it
sometimes may appear so. on the contrary, you have to consider
them as extremely dumb as they only do what they got programmed
to do, whether it makes sense or not.

that being said, there was just an example posted to
the list giving a lowdown on how to systematically narrow
down a problem, where you don't exactly know where the
origin is (your input or the machine or the compilation, that is).
i'll spell it out to you in more detail again.

first find you whether it is your input or something else that fails.
that can be done by running the example or benchmark inputs
shipped with lammps. there are plenty of them and they are
know to work unless they use (optional) features that you have
not compiled in, or have not been updated for a recent change
in the code (should not happen, but does happen).

second, you have to capture the _real_ error.
what you showed is a secondary message from
the facility that is "babysitting" the lammps processes.
this doesn't show the original problem. how to do
that is very system dependent. ask your local user support.

third you have to validate that the input you use does work well
elsewhere using the same version of the lammps code and
the same number of processors. and then either correct your
input of find out which part of lammps is failing and why.

the amount of information that you have reported is
too little to provide any help or let alone reproduce the
failure that you are seeing. if you say that your input
has X atoms, this is as useful as saying that you run
on a computer that has a green case unless you know
for a fact that the number of atoms has an impact
(or the color of the case).

you _are_ a scientists, right? so address this in a systematic
way just as you would have to address a problem in your research.
if you get an unexpected result there, you just don't go around
and try random things and as other people to speculate about
why you get the results you see, without explaining what it is
that you did, right?

sorry for the hard words, but you asked for it. :wink:

cheers,
     axel.

Thanks Axel ! for your advice. I am performing some systematic analysis now…

Vivek

Hello,

I ran some example files provided with LAMMPS on XE6 machine.

(i) example directory: comb, input files: data.Cu2O, ffield.comb, in.comb.Cu2O.elastic
single processor job run through command " aprun -n 1 ./lmp_kraken < in.comb.Cu2O.elastic".

(ii) example directory: reax, input files: data.rdx, ffield.reax, in.reax.rdx
single processor job run through command “aprun -n 1 ./lmp_kraken < in.reax.rdx”

No error in the “comb” run. It runs successfully.
In the case of “reax”, the “log.lammps” is empty. The file “standard.error” contains a single line “_pmii_daemon(SIGCHLD): PE 0 exit signal Segmentation fault”. The contents of the file “standard.out” is the following:

Sounds like the ReaxFF lib did not get built/linked correctly
into the LAMMPS executable. I would try to run interactively
to see if you cat get a more helpful error message.

Steve

Thank you for your suggestion. This is a Cray XE6 machine and I use the following method to compile:

(i) module swap PrgEnv-pgi PrgEnv-gnu
(ii) module load fftw/2.1.5.2 (fftw-2.1.5 is not installed on this machine)
(iii) in src directory “ln -s /opt/fftw/2.1.5.2/include/dfftw.h fftw.h”.
(iv) make yes-reax (in src directory)
(v) make -f Makefile.ftn (in lib/reax directory… I have attached Makefile.ftn with this email).
(vi) make xe6 (in src directory… I have attached Makefile.xe6 with this email, this is a very little change from Makefile.kraken sent to me by Axel).

The program compiles and works for example “comb” but fails for example “reax”.

So far, I have been able to compile LAMMPS (with reax) on local desktop (running ubuntu) and a local cluster (running fedora).

While compiling on fedora cluster I realized that while I need to compile with mpicc I need to link it with mpif90.

CC = mpicc
LINK = mpif90

The program then compiles and runs with reax.

However, when I try to compile with CC and link with CC the program compile but does not work for reax. However, I am unable to compile with CC and link with ftn. If I try to link with ftn, then it gives several errors.

Thanks,

Vivek

Makefile.ftn (1.65 KB)

Makefile.xe6 (3.63 KB)