replica exchange

Luke_Czapla · October 9, 2013, 3:27pm

Sorry for the late reply. My yahoo email was messed up by their change of format. I will describe the details of the problem with the replica exchange as Axel asked directly to the list:

The number of partitions is four (and there’s four temperatures) but when I
run it (mpirun -n 4 ~/lmp_openmpi -p 4 -cuda off -in in.run), it just says
it’s running on one partition and it does nothing and exits. With the gpu
it does the same thing but it throws a CUDA error like “Cuda driver error 4
in call at file ‘geryon/nvd_device.h’ in line 116.” sometimes before and
sometimes after it prints the information about 1 partition each time I try
it.

Everything’s below and I would greatly appreciate any help you could
provide. I will take your advice and switch to lj/sdk.

Thanks,
Luke Czapla

The restart2data output looked like this (I noticed that even when the
restart2data is from the same source tree as my binary, the version numbers
are different, but this restart file is from running with an earlier build):

luke@…4543…:~/random/q0.5_replica$ ./restart2data run4.rest DATA.FILE
in5.run
Reading restart file …
WARNING Restart file version does not match restart2data version
restart2data version = 23 July 2013
Restart file version = 31 May 2013
Ntimestep = 25000000
Nprocs = 4
Natoms = 40000
Nbonds = 3732
Nangles = 3599
Unit style = real
Atom style = full
Pair style = cg/cmm/coul/long
Bond style = harmonic
Angle style = cg/cmm
Xlo xhi = -76.2836 76.2836
Ylo yhi = -76.2836 76.2836
Zlo zhi = -76.2836 76.2836
Periodicity = 1 1 1
Boundary = 0 0, 0 0, 0 0
Writing data file …
Writing input file …
ERROR: Cannot write pair_style cg/cmm/coul/long to input file

The input file looks like this:

package gpu force/neigh 0 1 -1

units real
dimension 3
atom_style full
read_restart run2.rest
#read_data DATA.FILE
#include PARM.FILE

kspace_style pppm 1.0e-5
kspace_modify mesh 32 32 32 order 3

reset_timestep 0

neighbor 2.0 bin
#original neigh_modify delay 5
#neigh_modify exclude molecule sphere
neigh_modify delay 0 every 2 check yes

timestep 20.0
#run_style verlet
run_style respa 2 2 bond 1 angle 1 pair 2 kspace 2

#velocity all create 303.0 87285 dist gaussian

#fix 1 all npt temp 303.0 303.0 1000.0 iso 1.0 1.0 1000.0
#fix 2 all momentum 1 linear 1 1 1 angular

dump 1 all dcd 1000 run3.dcd

thermo 1000
thermo_style multi

variable t world 310.0 330.0 350.0 370.0
fix myfix all nvt $t $t 100.0

reset_timestep 0
temper 20000000 500 $t tempfix 37 69

write_restart run3.rest

sjplimp · October 11, 2013, 12:34pm

First -p 4 only specifies a single partition, using all 4 procs.
I think you want -p 4x1 if you want 4 partitions with a single
proc each. See the doc page on command-line args.

Second - the USER-CUDA package only allows you to
use 1 MPI task per GPU. So I don’t think you will be able
to run all 4 partitions using the same GPU. Probably not
with the GPU package either, since it has special logic
to allow multiple MPI tasks to use one GPU, but I think
that will only work within a single partition.

What you should be able to do (though I don’t know anyone
has done this) is to use 1 GPU per partition. E.g. on a 4

node system with 4 GPUs, use -p 4x1 and insure that
one partition runs on each node (with its GPU).

Steve

Luke_Czapla · October 11, 2013, 2:04pm

Thanks Steve,

Thanks for suggestion. It doesn’t work even if I turn the GPU off. I don’t use USER-CUDA because I put “-cuda off” at the command line. I actually fixed this " -partition 4x1" already because I found an example in the source “example” folder in LAMMPS the other day, but now it just says “running on 4 partitions” and does nothing.

Thanks
Luke

akohlmey · October 11, 2013, 2:12pm

Thanks Steve,

Thanks for suggestion. It doesn't work even if I turn the GPU off. I don't
use USER-CUDA because I put "-cuda off" at the command line. I actually
fixed this " -partition 4x1" already because I found an example in the
source "example" folder in LAMMPS the other day, but now it just says
"running on 4 partitions" and does nothing.

are you certain that it does "nothing"? with multi-partition runs, you
don't get output on the screen. i suggestion you make this following
little experiment using the input from example/melt.

[[email protected]... out] mpirun \-np 4 lmp\_g\+\+ \-p 4x1 \-in in\.melt LAMMPS $30 Sep 2013\-ICMS$ Running on 4 partitions of processors \[akohlmey@\.\.\.4263\.\.\. out\] ls
in.melt log.lammps log.lammps.0 log.lammps.1 log.lammps.2
log.lammps.3 screen.0 screen.1 screen.2 screen.3

as you can see, i do not get any output to the screen, but 4 log files
and 4 screen captures.

axel.

Luke_Czapla · October 11, 2013, 2:32pm

Thanks so much Steve and Axel. I found those partition log files and saw what I did wrong. It's up and running with the GPU!

Luke