[lammps-users] Why my lammps run slowly

Yangpeng_Ou · January 22, 2011, 7:07pm

Dear lammps,

I compiled a linux lmp1 under intel xeon and another lmp2 under mac book Pro( intel 2 core 2). Through run benchmarking problems, I found my lmp run one times slower than official benchmarking version.

I am emailing to ask advice for make lammps run faster on xeon linux OS, because I can run parallel jobs on linux Xeon server. Although both lammps could be wrong, maybe I not set up makefile suitably, and let lammps run very slow under linux xeon server.
I really have to figure out the problem, and I try to make it clear to ask for all your advice. Please ask me for information, if I did not say it clearly.

Thanks all members.
Best wishes,
Yangpeng Ou

I used lammps-svn version, two compiled version
lmp_mac (Macbook Pro, Core 2 Duo, CPU 2.53Gz,4G RAM)
lmp_linux(Linux Server, Intel® Xeon™ CPU 3.2GHz, 4.0GRAM)
I run benchmarking problems both on single processor and multi-processors.
Makefile for lmp_mac

svn = Macbook Pro, mpic++, gfortran, fink LAM/MPI, FFTW 2.1.5 SHELL = /bin/sh # --------------------------------------------------------------------- # compiler/linker settings # generally no need to edit this section # unless additional compiler/linker flags or libraries needed for your machine CC = mpic++ CCFLAGS = -O -MMD -MG DEPFLAGS = -M LINK = mpic++ LINKFLAGS = -O LIB = ARCHIVE = ar ARFLAGS = -rc SIZE = size # --------------------------------------------------------------------- # LAMMPS-specific settings # edit as needed for your machine # LAMMPS ifdef options, see doc/Section_start.html LMP_INC = -DLAMMPS_GZIP # MPI library, can be src/STUBS dummy lib # INC = path for mpi.h, MPI compiler settings # PATH = path for MPI library # LIB = name of MPI library MPI_INC = -DOMPI_SKIP_MPICXX MPI_PATH = MPI_LIB = # FFT library, can be -DFFT_NONE if not using PPPM from kspace package # INC = -DFFT_FFTW, -DFFT_INTEL, -DFFT_NONE, etc, FFT compiler settings # PATH = path for FFT library # LIB = name of FFT library FFTW = /Users/Solow/local FFT_INC = -DFFT_FFTW -I${FFTW}/include FFT_PATH = -L${FFTW}/lib FFT_LIB = -lfftw # additional system libraries needed by LAMMPS package libraries # these settings are IGNORED if the corresponding LAMMPS package # (e.g. gpu, meam) is NOT included in the LAMMPS build # SYSLIB = names of libraries # SYSPATH = paths of libraries gpu_SYSLIB = -lcudart meam_SYSLIB = -lgfortran reax_SYSLIB = -lgfortran user-atc_SYSLIB = -lblas -llapack gpu_SYSPATH = -L/usr/local/cuda/lib64 meam_SYSPATH = -L/usr/lib reax_SYSPATH = -L/usr/lib user-atc_SYSPATH = # --------------------------------------------------------------------- # build rules and dependencies # no need to edit this section include Makefile.package EXTRA_INC = (LMP_INC) (PKG_INC) (MPI_INC) (FFT_INC) EXTRA_PATH = (PKG_PATH) (MPI_PATH) (FFT_PATH) (PKG_SYSPATH) EXTRA_LIB = (PKG_LIB) (MPI_LIB) (FFT_LIB) (PKG_SYSLIB) # Link target (EXE): (OBJ) (LINK) (LINKFLAGS) (EXTRA_PATH) (OBJ) (EXTRA_LIB) (LIB) -o (EXE) (SIZE) (EXE) # Library target lib: (OBJ) (ARCHIVE) (ARFLAGS) (EXE) (OBJ) # Compilation rules .o:.cpp (CC) (CCFLAGS) (EXTRA_INC) -c < .d:.cpp (CC) (CCFLAGS) (EXTRA_INC) (DEPFLAGS) < > @ # Individual dependencies DEPENDS = (OBJ:.o=.d) include (DEPENDS)

“/Users/Solow/local” is location of my installed fftw2.
The Makefile for linux server is

svn = svn, mpic++, OpenMPI-1.1, FFTW2 SHELL = /bin/sh # --------------------------------------------------------------------- # compiler/linker settings # specify flags and libraries needed for your compiler CC = mpic++ CCFLAGS = -O2 \ -funroll-loops -fstrict-aliasing -Wall -W -Wno-uninitialized DEPFLAGS = -M LINK = mpic++ LINKFLAGS = -O LIB = -lstdc++ ARCHIVE = ar ARFLAGS = -rcsv SIZE = size # --------------------------------------------------------------------- # LAMMPS-specific settings # specify settings for LAMMPS features you will use # LAMMPS ifdef options, see doc/Section_start.html LMP_INC = -DLAMMPS_GZIP # MPI library, can be src/STUBS dummy lib # INC = path for mpi.h, MPI compiler settings # PATH = path for MPI library # LIB = name of MPI library MPI_INC = MPI_PATH = MPI_LIB = # FFT library, can be -DFFT_NONE if not using PPPM from KSPACE package # INC = -DFFT_FFTW, -DFFT_INTEL, -DFFT_NONE, etc, FFT compiler settings # PATH = path for FFT library # LIB = name of FFT library FFTW = /home/you/Mlib/fftw2 FFT_INC = -DFFT_FFTW -I${FFTW}/include FFT_PATH = FFT_LIB = -L${FFTW}/lib -lfftw # additional system libraries needed by LAMMPS package libraries # these settings are IGNORED if the corresponding LAMMPS package # (e.g. gpu, meam) is NOT included in the LAMMPS build # SYSLIB = names of libraries # SYSPATH = paths of libraries gpu_SYSLIB = -lcudart meam_SYSLIB = -lifcore -lsvml -lompstub -limf reax_SYSLIB = -lifcore -lsvml -lompstub -limf user-atc_SYSLIB = -lblas -llapack gpu_SYSPATH = -L/usr/local/cuda/lib64 meam_SYSPATH = -L/opt/intel/fce/10.0.023/lib reax_SYSPATH = -L/opt/intel/fce/10.0.023/lib user-atc_SYSPATH = # --------------------------------------------------------------------- # build rules and dependencies # no need to edit this section include Makefile.package EXTRA_INC = (LMP_INC) (PKG_INC) (MPI_INC) (FFT_INC) EXTRA_PATH = (PKG_PATH) (MPI_PATH) (FFT_PATH) (PKG_SYSPATH) EXTRA_LIB = (PKG_LIB) (MPI_LIB) (FFT_LIB) (PKG_SYSLIB) # Link target (EXE): (OBJ) (LINK) (LINKFLAGS) (EXTRA_PATH) (OBJ) (EXTRA_LIB) (LIB) -o (EXE) (SIZE) (EXE) # Library target lib: (OBJ) (ARCHIVE) (ARFLAGS) (EXE) (OBJ) # Compilation rules .o:.cpp (CC) (CCFLAGS) (EXTRA_INC) -c < .d:.cpp (CC) (CCFLAGS) (EXTRA_INC) (DEPFLAGS) < > @ # Individual dependencies DEPENDS = (OBJ:.o=.d) include (DEPENDS)

“FFTW = /home/you/Mlib/fftw2” is location of my installed fftw on linux server.

For my compilation, I only used package “kspace,manybody,molecule,opt,user-ewaldn”, and I intalled fftw2 by downloding, ./configure, and make install. I used default settings for fftw2 compilation except changing installed directory.
Bellow is my benchmarking results. Both will give same results as bench folder, but different run time.
For in.chain without FFT calculation
Lammps Official Result (linux, 1 processor)
Loop time of 1.88908 on 1 procs for 100 steps with 32000 atoms Pair time () = 0.444468 (23.5283) Bond time () = 0.293589 (15.5413) Neigh time () = 0.685171 (36.27) Comm time () = 0.0576162 (3.04996) Outpt time () = 0.000181913 (0.00962972) Other time () = 0.408057 (21.6008) M****ac Version( 1 processor)
Loop time of 1.89473 on 1 procs for 100 steps with 32000 atoms Pair time () = 0.395487 (20.873) Bond time () = 0.244591 (12.909) Neigh time () = 0.78156 (41.2492) Comm time () = 0.0508327 (2.68285) Outpt time () = 0.000189066 (0.00997852) Other time () = 0.422068 (22.2759)
Linux Version(1 processor)
Loop time of 3.3551 on 1 procs for 100 steps with 32000 atoms Pair time () = 0.927651 (27.649) Bond time () = 0.337159 (10.0491) Neigh time () = 1.22131 (36.4017) Comm time () = 0.0865264 (2.57895) Outpt time () = 0.0006392 (0.0190516) Other time () = 0.781814 (23.3022)
For in.rhodo problems
Lammps Official Result (linux, 1 processor)
Loop time of 64.5302 on 1 procs for 100 steps with 32000 atoms Pair time () = 46.3691 (71.8564) Bond time () = 2.88541 (4.4714) Kspce time () = 6.09222 (9.44088) Neigh time () = 6.59142 (10.2145) Comm time () = 0.18323 (0.283944) Outpt time () = 0.000371933 (0.000576371) Other time () = 2.40847 (3.73231) **M****ac** Version( 1 processor) Loop time of 74.514 on 1 procs for 100 steps with 32000 atoms Pair time () = 55.8841 (74.9981) Bond time () = 2.5608 (3.43668) Kspce time () = 5.63009 (7.55575) Neigh time () = 8.19876 (11.003) Comm time () = 0.144003 (0.193256) Outpt time () = 0.000427961 (0.000574337) Other time () = 2.09583 (2.81267) FFT time ( of Kspce) = 0.449319 (7.98067) FFT Gflps 3d (1d only) = 1.15663 1.96157 **Linux** Version(1 processor) Loop time of 128.756 on 1 procs for 100 steps with 32000 atoms Pair time () = 99.0928 (76.9617) Bond time () = 3.3348 (2.59002) Kspce time () = 6.68045 (5.18846) Neigh time () = 15.1253 (11.7473) Comm time () = 0.282435 (0.219357) Outpt time () = 0.00125003 (0.000970851) Other time () = 4.2389 (3.2922) FFT time (% of Kspce) = 0.613222 (9.17936) FFT Gflps 3d (1d only) = 0.847487 1.53449
4 processor parallel runs

For in.chain without FFT calculation
Lammps Official Result (linux, 4 processors)

Loop time of 0.437537 on 4 procs for 100 steps with 32000 atoms Pair time () = 0.0835259 (19.09) Bond time () = 0.0587637 (13.4306) Neigh time () = 0.158699 (36.2711) Comm time () = 0.0484382 (11.0707) Outpt time () = 0.000112474 (0.0257062) Other time () = 0.0879969 (20.1119)
Linux Version(4 processors
Loop time of 1.4103 on 4 procs for 100 steps with 32000 atoms Pair time () = 0.153455 (10.8811) Bond time () = 0.085247 (6.04462) Neigh time () = 0.272226 (19.3028) Comm time () = 0.687567 (48.7534) Outpt time () = 0.00132656 (0.0940627) Other time () = 0.210473 (14.924)

For in.rhodo problems
Lammps Official Result (linux, 4 processors)
Loop time of 16.771 on 4 procs for 100 steps with 32000 atoms Pair time () = 11.2592 (67.1347) Bond time () = 0.685832 (4.08939) Kspce time () = 2.03664 (12.1438) Neigh time () = 1.60447 (9.56693) Comm time () = 0.334371 (1.99374) Outpt time () = 0.00025332 (0.00151046) Other time () = 0.850284 (5.06997) FFT time ( of Kspce) = 0.194679 (9.55886) FFT Gflps 3d (1d only) = 2.6695 6.75898
Linux Version(4 processors)
Loop time of 35.4643 on 4 procs for 100 steps with 32000 atoms Pair time () = 21.8471 (61.603) Bond time () = 0.841134 (2.37178) Kspce time () = 4.49888 (12.6857) Neigh time () = 3.05328 (8.60943) Comm time () = 2.51904 (7.10302) Outpt time () = 0.00240356 (0.0067774) Other time () = 2.70251 (7.62036) FFT time ( of Kspce) = 1.04074 (23.1334) FFT Gflps 3d (1d only) = 0.499352 5.681

Yangpeng_Ou · January 23, 2011, 6:38pm

Thanks all,
I have already trying to figure out the questions.
It looks like Intel® Xeon™ 3.2G on server is based on Pentium architecture, which is worse than Core 2 Duo architecture. Therefore, although it is computation server, it provides less computation power than core 2 duo cpu.

Best wishes,
Yangpeng Ou

在 Jan 22, 2011，2:07 PM， Yangpeng Ou 写道：