Flipping bond and dihedral types.

I am trying to flip bonds and dihedrals off/on temporarily by changing their types to negative values, as proposed by Axel Kohlmeyer here a few days ago.

I have written this simple fix which is only like 200 lines, and appears to work by itself, but it results in a fancy error that I have never seen before. Looks like this :

[jilet:22607] *** Process received signal ***
[jilet:22607] Signal: Segmentation fault (11)
[jilet:22607] Signal code: Address not mapped (1)
[jilet:22607] Failing at address: 0x8

This happens even when i run on a single core without MPI. It happens sporadically between systems of different size (all the same dna model with varying length) and it even does not appear to happen in some of them. As the fix seems too simple to fail in itself, I am guessing that I am breaking something else that is related to rest of LAMMPS.

I am attaching the bare-bones version of my fix that reproduces this problem. I’d appreciate any insight on how to begin debugging this. Or comments on if this is – once again – a fundamentally incorrect implementation.

Thanks

Murat

fix_bond_update_simple.cpp (5.29 KB)

I am trying to flip bonds and dihedrals off/on temporarily by changing their
types to negative values, as proposed by Axel Kohlmeyer here a few days ago.

I have written this simple fix which is only like 200 lines, and appears to
work by itself, but it results in a fancy error that I have never seen
before. Looks like this :

you have never seen a segmentation fault before??? wow!

this is probably the most common error that people see when
programming in C or C++ or other languages with explicit memory
management via pointer variables.

[jilet:22607] *** Process received signal ***
[jilet:22607] Signal: Segmentation fault (11)
[jilet:22607] Signal code: Address not mapped (1)
[jilet:22607] Failing at address: 0x8

this looks like you are trying to do an illegal dereference.

This happens even when i run on a single core without MPI. It happens
sporadically between systems of different size (all the same dna model with
varying length) and it even does not appear to happen in some of them. As
the fix seems too simple to fail in itself, I am guessing that I am breaking
something else that is related to rest of LAMMPS.

unlikely, but easy to determine by running under a debugger.

I am attaching the bare-bones version of my fix that reproduces this
problem. I'd appreciate any insight on how to begin debugging this. Or
comments on if this is -- once again -- a fundamentally incorrect
implementation.

typical tools for debugging bad memory references are valgrind's
memcheck and gdb.

there should be some tutorials on how to track this down. it is easier
to show than to describe, tho.

Segmentation fault, I have seen, ‘Memory not mapped’ is new to me. Although I suppose this is not the place to learn about those, I thought maybe there would be some fundamental difference that I should know about.

My previous segfaults consisted of bad mallocs or non-existent array indices and such simple things. I honestly never worked on anything that is big enough to require a proper debugger. I guess its about time I learn about those too.

What bothers me is that the fix can run successfully for a number of times without raising any immediate errors and even flips the bonds on/off. At some point it just hangs on this error. This is why I looked beyond C++ and into lammps structure.

Segmentation fault, I have seen, 'Memory not mapped' is new to me. Although

it is the same thing. OpenMPI traps SIGSEGV and then does a closer
inspection. a pointer address of 0x8 is not possible on linux
machines, since the first page is not given out (in order to make
segfaults trap more easily. this usually would happen is a value is
assigned to a pointer without dereferencing it or a pointer is
dereferenced where it should not be.

I suppose this is not the place to learn about those, I thought maybe there
would be some fundamental difference that I should know about.

My previous segfaults consisted of bad mallocs or non-existent array indices
and such simple things. I honestly never worked on anything that is big
enough to require a proper debugger. I guess its about time I learn about
those too.

i suggest you compile a version of LAMMPS with MPI stubs (serial) and
compile with -g and without optimization. that makes debugging things
easier. then you can run with either valgrind or under gdb and obtain
a stack trace to the see the location of the problem and then you need
to find how this got triggered.

upon cursory inspection of your code, i am pretty certain that your
code is at fault.

Thanks, that is good to know.

I am inferring that you don’t have anything against the nature of the implementation, the LAMMPS-way-wise?

I am basically asking if I should iterate over this or is there a good reason that I should start over. I don’t mind starting over.

PS : Sorry about the personal mail. My bad.

Thanks, that is good to know.

I am inferring that you don’t have anything against the nature of the implementation, the LAMMPS-way-wise?
It looks “too simple”, particularly the part flipping the dihedral type. I think you need to read some more lammps source code.