Totalview to debug lammps program

Dear all,

I want to use totalview to debug lammps program. After compiling the lammps with -g option and submitting jobs using batch script which uses the following running command:

srun --mpi=openmpi /home/huy16104/lammps-30Jul16/src/lmp_test<./in.test>log.lammps

I want to use totalview attaching mode, but I can not find the running program in the session. I’m not familiar with this, if anyone has the experience about using totalview to debug lammps?

Thanks,

Huilin

PhD student in University of Conneticut

Dear all,

I want to use totalview to debug lammps program. After compiling the lammps
with -g option and submitting jobs using batch script which uses the
following running command:

srun --mpi=openmpi
/home/huy16104/lammps-30Jul16/src/lmp_test<./in.test>log.lammps

I want to use totalview attaching mode, but I can not find the running
program in the session. I'm not familiar with this, if anyone has the
experience about using totalview to debug lammps?

you should talk to your local HPC people. this is all very specific to
the machine you are running on.
i would also suggest to first practice using totalview with a
smaller/simpler program to get experience.

in my personal experience, debugger software is useless without a
sufficient amount of practice and experience. debuggers don't
automatically find the real problem, but are rather a tool to help
proving where the problem is *after* you have developed some
hypothesis.

in my 20+ years of parallel programming, i have so far never have come
across a debugging problem that i could not solve without using a
parallel debugger. typically, a few well placed print statements and
using a hack like "mpirun -np 4 xterm -e gdb --args program_to_debug
-and -flags" on a local machine does the trick for me.

parallel debuggers are extremely useful, when bug only show up with a
very large number of processors, but i have never come across any of
those so far.

axel.

Hi Axel,

Thanks for your suggestion and your experiences. My program can run for a long time and stop with no error message in the log file. I have checked many possible reasons, they are all related with the running time(a long time). So, I should use debugger. Anyway, I will seek help of local HPC people.

Thanks,

Huilin

Hi Axel,

Thanks for your suggestion and your experiences. My program can run for a
long time and stop with no error message in the log file. I have checked
many possible reasons, they are all related with the running time(a long
time). So, I should use debugger. Anyway, I will seek help of local HPC
people.

i disagree about having to use the debugger now. there are plenty of
things to look into beforehand.

if running a long time is an issue, how about the size of the problem?
can you reduce the number of atoms in your system with the problem
still showing up? how about setting up an input with just 10 atoms or
less? these can run *much* faster, and thuse can be easily
investigated locally. one of the most common mistake people do, is not
to try and reduce the size of the problem before starting to
debug/investigate a potential problem. they spend endless amounts of
time waiting for something that may or may not happen in time using
tools that may or may not be helpful.

did you ever do checking with valgrind's memory checker?
did you ever try to set up an input where one of the MPI ranks has no
(local) atoms?
in my personal experience, while preparing such reduced and extreme
inputs, i often notice more drastic manifestations of bugs, which are
then much easier to track down. the same is even more true for cases,
where the code runs, but the result is off. already coming up with
those inputs for corner cases, i often learn so much about what could
go wrong, that i can spot bugs by just looking at the source code.
people often put far too much trust into their own code. a good dose
of paranoia is often helpful.

axel.

Hi Axel,

Thanks for your debug suggestions. I have solved the problem which is caused by memory error. Your experiences are very important to me.

Thanks,

Huilin