Hi Mike,
You are right. I got it to work by doing the following steps:
1. Updated to the latest nvidia driver
2. Updated CUDA to 3.2
3. Deleted the lammps directory that I untarred from the tarball
4. Untarred the lammps tarball
5. make STUBS directory
6. make lib/gpu using STUBS mpi
7. make lammps using STUBS,FFTW and GPU
And here are the timings for in.melt (Loop time). I used force/neigh
CPU - 31Mar11 version -1.63 s
GPU - 31Mar11 version -0.27 s
GPU -5Sep10 version (which doesn't run anymore after I did the above
steps) - 0.45 s
setting fix gpu mode to force gives the same timings as the 5Sep10 version.
I also run my data and input files that did not run in the 5Sep10
version of lammps because of a cell list error. Here is the url of
that thread: http://lammps.sandia.gov/threads/msg11440.html. And I am
happy to report that this system ran in the 31Mar11 version and I had
a speedup of 5.6 ( 113 s - cpu /20 s gpu) by setting the fix gpu mode
to force/neigh.
Now I will be testing my input files that are charged and using pppm
by using lj/cut/coul/long/gpu. What speedups/ benchmark did you get
for these kind of systems? And how is pppm implemented? Is the
calculations done in the gpu ?
Thanks again.
Jan-Michael