help to fix the gpu problem

Hongyi_Liu · September 28, 2011, 8:29pm

Dear Lammps Users:

I downloaded the last version of gpu package and compiled
"Makefile.fermi" to get the .a file.

But when I run ./nvc_get_devices, I got the error message as follow:

sjplimp · September 29, 2011, 2:17pm

That's the kind of error you get if you haven't launched
the MPI daemon, e.g. mpd with MPICH. But I don't know
why nvc_get_devices would require you to do that.

Maybe Mike Brown has a comment on this.

Steve

Hongyi_Liu · September 29, 2011, 2:39pm

Thanks a lot. I used mpich2-1.2.1p1 and here are the commands to compile mpi:
./configure --prefix=/usr/local/mpich2/1.2.1p1 --enable-f77
--enable-f90 --enable-cxx --enable-threads --with-PACKAGE=yes
--with-pm=mpd CC=icc CXX=icpc F77=ifort F90=ifort 2>&1 | tee c.txt
make 2>&1 | tee m.txt
make install 2>&1 | tee mi.txt
I add the following lines in .bashrc:
export PATH=/usr/local/mpich2/1.2.1p1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/mpich2/1.2.1p1/lib:$PATH

My intel compilers is the version 11.1/072. My graphic cards are three
GTX480s. I compiled the gpu package with Makefile.fermi in which I
changed arch=20 and deleted the option -ffast-math unknown by icpc.

So I am pretty sure that I installed mpich2 very well. And lmp_linux
works very well without gpu package.

Thanks again.

Hongyi

_Brown_W_Michael · September 29, 2011, 2:59pm

You can ignore the MPI error - the underlying GPU library, Geryon, will use either exit() or MPI_Abort() depending on the compile flags; these should be changed for nvc_get_devices to use exit() since it is serial.

The "unknown error" you are getting is unusual and probably hardware or os related. Have you tried to run any other cuda code on the GPUs. I have seen this error, for example, when the power is not connected correctly to the GPUs.

- Mike

Hongyi Liu wrote:

Hongyi_Liu · September 29, 2011, 4:38pm

Thank you very much. I set both lib and lib64 as the Environment
Variables (I didn't add the 32-bit lib before. Maybe it is a new tip
now) and then ./nvc_get_devices and gpu package work very well. Now I
can run the gpu examples.

I still get three regular warnings and one negligeable warning in case
of any need:
icpc: command line warning #10006: ignoring unknown option '-ffast-math'
cudpp_mini/cudpp_maximal_launch.cpp(83): warning #68: integer
conversion resulted in a change of sign
return -1;
^

cudpp_mini/cudpp_maximal_launch.cpp(88): warning #68: integer
conversion resulted in a change of sign
return -1;
^

cudpp_mini/cudpp_maximal_launch.cpp(93): warning #68: integer
conversion resulted in a change of sign
return -1;
^

Thanks again.

-HL

Hongyi_Liu · September 29, 2011, 4:44pm

Although I can run the gpu example, there are four warnings when I
compile lammps with gpu package:
icc -O -DLAMMPS_GZIP -I../../lib/atc -I../../lib/reax
-I../../lib/poems -I../../lib/meam -DMPICH_SKIP_MPICXX
-I/usr/local/mpich2/1.2.1p1/include -DFFT_FFTW2
-I/usr/local/fftw/2.1.5/include -c force.cpp
pair_cg_cmm_coul_msm.h(42): warning #1125: function
"LAMMPS_NS::Pair::extract(char *, int &)" is hidden by
"LAMMPS_NS::PairCGCMMCoulMSM::extract" -- virtual function override
intended?
void *extract(char *str);
^

icc -O -DLAMMPS_GZIP -I../../lib/atc -I../../lib/reax
-I../../lib/poems -I../../lib/meam -DMPICH_SKIP_MPICXX
-I/usr/local/mpich2/1.2.1p1/include -DFFT_FFTW2
-I/usr/local/fftw/2.1.5/include -c lammps.cpp
pair_cg_cmm_coul_msm.h(42): warning #1125: function
"LAMMPS_NS::Pair::extract(char *, int &)" is hidden by
"LAMMPS_NS::PairCGCMMCoulMSM::extract" -- virtual function override
intended?
void *extract(char *str);
^

icc -O -DLAMMPS_GZIP -I../../lib/atc -I../../lib/reax
-I../../lib/poems -I../../lib/meam -DMPICH_SKIP_MPICXX
-I/usr/local/mpich2/1.2.1p1/include -DFFT_FFTW2
-I/usr/local/fftw/2.1.5/include -c modify.cpp
icc -O -DLAMMPS_GZIP -I../../lib/atc -I../../lib/reax
-I../../lib/poems -I../../lib/meam -DMPICH_SKIP_MPICXX
-I/usr/local/mpich2/1.2.1p1/include -DFFT_FFTW2
-I/usr/local/fftw/2.1.5/include -c pair_cg_cmm_coul_long_gpu.cpp
icc -O -DLAMMPS_GZIP -I../../lib/atc -I../../lib/reax
-I../../lib/poems -I../../lib/meam -DMPICH_SKIP_MPICXX
-I/usr/local/mpich2/1.2.1p1/include -DFFT_FFTW2
-I/usr/local/fftw/2.1.5/include -c pair_cg_cmm_coul_msm.cpp
pair_cg_cmm_coul_msm.h(42): warning #1125: function
"LAMMPS_NS::Pair::extract(char *, int &)" is hidden by
"LAMMPS_NS::PairCGCMMCoulMSM::extract" -- virtual function override
intended?
void *extract(char *str);
^

icc -O -DLAMMPS_GZIP -I../../lib/atc -I../../lib/reax
-I../../lib/poems -I../../lib/meam -DMPICH_SKIP_MPICXX
-I/usr/local/mpich2/1.2.1p1/include -DFFT_FFTW2
-I/usr/local/fftw/2.1.5/include -c pair_cg_cmm_coul_msm_gpu.cpp
pair_cg_cmm_coul_msm.h(42): warning #1125: function
"LAMMPS_NS::Pair::extract(char *, int &)" is hidden by
"LAMMPS_NS::PairCGCMMCoulMSM::extract" -- virtual function override
intended?
void *extract(char *str);
^

Thanks again.

-HL

akohlmey · September 29, 2011, 4:47pm

you can ignore anything with CoulMSM in it. it is not used.

axel.

Hongyi_Liu · September 29, 2011, 4:50pm

Thanks a lot. Good to know this.