GPU input file problem

When I run lammps using the following input file

# 3d Lennard-Jones melt
newton off

variable x index 1
variable y index 1
variable z index 1

variable xx equal 20*$x
variable yy equal 20*$y
variable zz equal 20*$z

units lj
atom_style atomic

lattice fcc 0.8442
region box block 0 \{xx\} 0 {yy} 0 ${zz}
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut/gpu one/node 0 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify delay 0 every 20 check no

fix 1 all nve

run 100

I get the following on my single node/GPU system

[email protected]...:~/lammps-2Jun12/src$ mpirun -np 1 ./lmp_linux < in.lj
LAMMPS (2 Jun 2012)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (33.5919 33.5919 33.5919)
  1 by 1 by 1 MPI processor grid
Created 32000 atoms
ERROR: The package gpu command is required for gpu styles (gpu_extra.h:60)

even when I try sample gpu input file provided in the lammps distribution I get the same error.

I have

[email protected]...:~/lammps-2Jun12/lib/gpu$ ./nvc_get_devices
Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA Driver
CUDA Driver Version: 4.20

Device 0: "GeForce 9500 GT"
  Type of device: GPU
  Compute capability: 1.1
  Double precision support: No
  Total amount of global memory: 0.499695 GB
  Number of compute units/multiprocessors: 4
  Number of cores: 32
  Total amount of constant memory: 65536 bytes
  Total amount of local/shared memory per block: 16384 bytes
  Total number of registers available per block: 8192
  Warp size: 32
  Maximum number of threads per block: 512
  Maximum group size (# of threads per block) 512 x 512 x 64
  Maximum item sizes (# threads for each dim) 65535 x 65535 x 1
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 256 bytes
  Clock rate: 1.375 GHz
  Run time limit on kernels: Yes
  Integrated: No
  Support host page-locked memory mapping: Yes
  Compute mode: Default
  Concurrent kernel execution: No
  Device has ECC support enabled: No

I assume somewhere I need to include package gpu put I can't determine where from the documentation.

Per Andersen

See the doc page for the package command.
It explains how to either add it to your script
of use the -sf command-line option.

Steve

I tried -sf gpu and got

[email protected]...:~/lammps-2Jun12/src$ mpirun -np 1 ./lmp_linux -sf gpu < in.lj_gpu
LAMMPS (2 Jun 2012)
ERROR: GPU library not compiled for this accelerator (gpu_extra.h:40)

I followed the instructions to build in lib/gpu/README I changed the makefile to use the old architecture since it's compute capability is 1.1.

CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE

cd ~/lammps/lib/gpu
emacs Makefile.linux
make -f Makefile.linux
./nvc_get_devices
cd ../../src
emacs ./MAKE/Makefile.linux
make yes-asphere
make yes-kspace
make yes-gpu
make linux

I redid the build

[email protected]...:~/lammps-2Jun12/src$ make yes-asphere
Installing package asphere
[email protected]...:~/lammps-2Jun12/src$ make yes-kspace
Installing package kspace
[email protected]...:~/lammps-2Jun12/src$ make yes-gpu
Installing package gpu

I checked to make sure I linked against libgpu.a and I can see I did, sorry about this long listing

gcc -O -L../../lib/gpu -L/usr/local/cuda/lib64 angle_charmm.o angle_cosine.o angle_cosine_delta.o angle_cosine_periodic.o angle_cosine_squared.o angle.o angle_harmonic.o angle_hybrid.o angle_table.o atom.o atom_vec_angle.o atom_vec_atomic.o atom_vec_bond.o atom_vec_charge.o atom_vec.o atom_vec_ellipsoid.o atom_vec_full.o atom_vec_hybrid.o atom_vec_line.o atom_vec_molecular.o atom_vec_sphere.o atom_vec_tri.o balance.o bond.o bond_fene.o bond_fene_expand.o bond_harmonic.o bond_hybrid.o bond_morse.o bond_nonlinear.o bond_quartic.o bond_table.o change_box.o comm.o compute_angle_local.o compute_atom_molecule.o compute_bond_local.o compute_centro_atom.o compute_cluster_atom.o compute_cna_atom.o compute_com.o compute_com_molecule.o compute_coord_atom.o compute.o compute_dihedral_local.o compute_displace_atom.o compute_erotate_asphere.o compute_erotate_sphere.o compute_group_group.o compute_gyration.o compute_gyration_molecule.o compute_heat_flux.o compute_improper_local.o compute_ke_atom.o compute_ke.o compute_msd.o compute_msd_molecule.o compute_pair.o compute_pair_local.o compute_pe_atom.o compute_pe.o compute_pressure.o compute_property_atom.o compute_property_local.o compute_property_molecule.o compute_rdf.o compute_reduce.o compute_reduce_region.o compute_slice.o compute_stress_atom.o compute_temp_asphere.o compute_temp_com.o compute_temp.o compute_temp_deform.o compute_temp_partial.o compute_temp_profile.o compute_temp_ramp.o compute_temp_region.o compute_temp_sphere.o compute_ti.o create_atoms.o create_box.o delete_atoms.o delete_bonds.o dihedral_charmm.o dihedral.o dihedral_harmonic.o dihedral_helix.o dihedral_hybrid.o dihedral_multi_harmonic.o dihedral_opls.o displace_atoms.o domain.o dump_atom.o dump_cfg.o dump.o dump_custom.o dump_dcd.o dump_image.o dump_local.o dump_xyz.o error.o ewald.o fft3d.o fft3d_wrap.o finish.o fix_adapt.o fix_addforce.o fix_ave_atom.o fix_ave_correlate.o fix_aveforce.o fix_ave_histo.o fix_ave_spatial.o fix_ave_time.o fix_box_relax.o fix.o fix_deform.o fix_deposit.o fix_drag.o fix_dt_reset.o fix_efield.o fix_enforce2d.o fix_evaporate.o fix_external.o fix_gpu.o fix_gravity.o fix_heat.o fix_indent.o fix_langevin.o fix_lineforce.o fix_minimize.o fix_momentum.o fix_move.o fix_nh_asphere.o fix_nh.o fix_nh_sphere.o fix_nph_asphere.o fix_nph.o fix_nph_sphere.o fix_npt_asphere.o fix_npt.o fix_npt_sphere.o fix_nve_asphere.o fix_nve_asphere_noforce.o fix_nve.o fix_nve_limit.o fix_nve_line.o fix_nve_noforce.o fix_nve_sphere.o fix_nve_tri.o fix_nvt_asphere.o fix_nvt.o fix_nvt_sllod.o fix_nvt_sphere.o fix_orient_fcc.o fix_planeforce.o fix_press_berendsen.o fix_print.o fix_qeq_comb.o fix_read_restart.o fix_recenter.o fix_respa.o fix_restrain.o fix_rigid.o fix_rigid_nve.o fix_rigid_nvt.o fix_setforce.o fix_shake.o fix_shear_history.o fix_spring.o fix_spring_rg.o fix_spring_self.o fix_store_force.o fix_store_state.o fix_temp_berendsen.o fix_temp_rescale.o fix_thermal_conductivity.o fix_tmd.o fix_ttm.o fix_viscosity.o fix_viscous.o fix_wall.o fix_wall_harmonic.o fix_wall_lj126.o fix_wall_lj93.o fix_wall_reflect.o fix_wall_region.o force.o group.o image.o improper.o improper_cvff.o improper_harmonic.o improper_hybrid.o improper_umbrella.o input.o integrate.o irregular.o kspace.o lammps.o lattice.o library.o main.o math_extra.o memory.o min_cg.o min.o min_fire.o min_hftn.o minimize.o min_linesearch.o min_quickmin.o min_sd.o modify.o neigh_bond.o neighbor.o neigh_derive.o neigh_full.o neigh_gran.o neigh_half_bin.o neigh_half_multi.o neigh_half_nsq.o neigh_list.o neigh_request.o neigh_respa.o neigh_stencil.o output.o pair_adp.o pair_airebo.o pair_beck.o pair_born_coul_long.o pair_born_coul_wolf.o pair_born.o pair_buck_coul_cut.o pair_buck_coul_cut_gpu.o pair_buck_coul_long.o pair_buck_coul_long_gpu.o pair_buck.o pair_buck_gpu.o pair_comb.o pair_coul_cut.o pair_coul_debye.o pair_coul_long.o pair_coul_long_gpu.o pair_coul_wolf.o pair.o pair_dpd.o pair_dpd_tstat.o pair_eam_alloy.o pair_eam_alloy_gpu.o pair_eam.o pair_eam_fs.o pair_eam_fs_gpu.o pair_eam_gpu.o pair_eim.o pair_gauss.o pair_gayberne.o pair_gayberne_gpu.o pair_hbond_dreiding_lj.o pair_hbond_dreiding_morse.o pair_hybrid.o pair_hybrid_overlay.o pair_lcbop.o pair_line_lj.o pair_lj96_cut.o pair_lj96_cut_gpu.o pair_lj_charmm_coul_charmm.o pair_lj_charmm_coul_charmm_implicit.o pair_lj_charmm_coul_long.o pair_lj_charmm_coul_long_gpu.o pair_lj_cubic.o pair_lj_cut_coul_cut.o pair_lj_cut_coul_cut_gpu.o pair_lj_cut_coul_debye.o pair_lj_cut_coul_long.o pair_lj_cut_coul_long_gpu.o pair_lj_cut_coul_long_tip4p.o pair_lj_cut.o pair_lj_cut_gpu.o pair_lj_expand.o pair_lj_expand_gpu.o pair_lj_gromacs_coul_gromacs.o pair_lj_gromacs.o pair_lj_smooth.o pair_lj_smooth_linear.o pair_morse.o pair_morse_gpu.o pair_rebo.o pair_resquared.o pair_resquared_gpu.o pair_soft.o pair_sw.o pair_table.o pair_table_gpu.o pair_tersoff.o pair_tersoff_zbl.o pair_tri_lj.o pair_yukawa.o pair_yukawa_gpu.o pppm_cg.o pppm.o pppm_gpu.o pppm_tip4p.o procmap.o random_mars.o random_park.o read_data.o read_dump.o read_dump_native.o read_restart.o region_block.o region_cone.o region.o region_cylinder.o region_intersect.o region_plane.o region_prism.o region_sphere.o region_union.o remap.o remap_wrap.o replicate.o respa.o run.o set.o special.o thermo.o timer.o universe.o update.o variable.o velocity.o verlet.o write_restart.o -lgpu -lmpich -lmpl -lpthread -lfftw -ljpeg -lcudart -lcuda -lstdc++ -o ../lmp_linux
text data bss dec hex filename
24479348 9072 25344 24513764 1760ce4 ../lmp_linux
make[1]: Leaving directory `/home/panderse/lammps-2Jun12/src/Obj_linux'

I can also see lmp_linux is linked against the cuda shared libaries

[email protected]...:~/lammps-2Jun12/src$ ldd lmp_linux
  linux-vdso.so.1 => (0x00007ffffddff000)
  libmpich.so.2 => /usr/lib/libmpich.so.2 (0x00007f24d533e000)
  libmpl.so.1 => /usr/lib/libmpl.so.1 (0x00007f24d5139000)
  libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f24d4f1a000)
  libfftw.so.2 => /usr/lib/libfftw.so.2 (0x00007f24d4ce2000)
  libjpeg.so.62 => /usr/lib/x86_64-linux-gnu/libjpeg.so.62 (0x00007f24d4abe000)
  libcudart.so.4 => /usr/local/cuda/lib64/libcudart.so.4 (0x00007f24d485f000)
  libcuda.so.1 => /usr/lib/libcuda.so.1 (0x00007f24d3e03000)
  libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f24d3afd000)
  libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f24d38e6000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f24d3552000)
  libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f24d32cd000)
  librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f24d30c4000)
  libcr.so.0 => /usr/lib/libcr.so.0 (0x00007f24d2eba000)
  /lib64/ld-linux-x86-64.so.2 (0x00007f24d5725000)
  libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f24d2cb6000)
  libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f24d2a9d000)

I must be missing a build step or environment variable somewhere if the code thinks it has not got the gpu package available?

Per

FYI:

I got lamps/gpu to work, the problem was I originally built the gpu library with the makefile set for tesla, when I discovered my mistake I cleaned out all the object files but not the ptx files. Therefore when I recompiled for my older cuda architecture I was mixing architectures in the library. I also discovered that I can't compile with the -DCUDA_PRE_THREE, the nvcc compile statements fail on a invalid declaration of double4. Also the tutorial I was using http://lammps.sandia.gov/workshops/Feb10/Mike_Brown/gpu_tut.pdf has the statement

pair_style lj/cut/gpu one/node 0 2.5

this failed as a syntax error I ended up using

pair_style lj/cut/gpu 2.5, my output was

[email protected]...:~/lammps-2Jun12/src$ mpirun -np 1 ./lmp_linux -sf gpu < in.lj
LAMMPS (2 Jun 2012)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (33.5919 33.5919 33.5919)
  1 by 1 by 1 MPI processor grid
Created 32000 atoms