lj/charmm/coul/long/opt broken in latest release?

Michael_Mitchell · May 9, 2011, 4:00am

Dear List:
I’ve observed that the lj/charmm/coul/long/opt functionality is broken in the latest release (May 4th). The attached input file runs without issue if I use the unoptimized lj/charmm/coul/long pair style. However, when changed to lj/charmm/coul/long/opt I get a segmentation fault (see attached log file). Has any else noticed this or have a fix?

Best,
Michael

data.tip3p (644 KB)

log.noopt (1.7 KB)

in.tip3p_noopt (435 Bytes)

in.tip3p (439 Bytes)

log.opt (2.21 KB)

akohlmey · May 9, 2011, 2:12pm

dear michael,

Dear List:
I've observed that the lj/charmm/coul/long/opt functionality is broken in
the latest release (May 4th). The attached input file runs without issue if
I use the unoptimized lj/charmm/coul/long pair style. However, when changed
to lj/charmm/coul/long/opt I get a segmentation fault (see attached log
file). Has any else noticed this or have a fix?

the /opt pair styles contain several code constructs that don't
fully comply with the c++ standards. so, in fact, they were broken
from the get go. however, they did work for most compilers most
of the time and the big question is now. did you move to a compiler
that is more strict in terms of standard compliance or is there some
oversight in the code that was added since it was last tested/used?

can you try to confirm which older version of lammps does work
and where exactly it is breaking?

cheers,
axel.

Michael_Mitchell · May 11, 2011, 5:36am

Dear Axel:
Thanks for the response.

dear michael,

Dear List:
I've observed that the lj/charmm/coul/long/opt functionality is broken in
the latest release (May 4th). The attached input file runs without issue if
I use the unoptimized lj/charmm/coul/long pair style. However, when changed
to lj/charmm/coul/long/opt I get a segmentation fault (see attached log
file). Has any else noticed this or have a fix?

the /opt pair styles contain several code constructs that don't
fully comply with the c++ standards. so, in fact, they were broken
from the get go. however, they did work for most compilers most
of the time and the big question is now. did you move to a compiler
that is more strict in terms of standard compliance or is there some
oversight in the code that was added since it was last tested/used?

I've compiled with both the intel icpc (version 11.1) and g++ (4.1.2) but get segmentation faults in both cases.
I've also tried compiling at a lower level of optimization (-O1, -O2 etc) using the intel compiler to the same effect.

can you try to confirm which older version of lammps does work
and where exactly it is breaking?

I can confirm that the optimized functions are working using the Mar 15th version.
Running the serial version through gdb gives:
...
PPPM initialization ...
G vector = 0.281342
grid = 24 24 24
stencil order = 5
RMS precision = 4.47383e-05
brick FFT buffer size/proc = 29791 13824 11532
Setting up run ...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000617445 in LAMMPS_NS::PairLJCharmmCoulLongOpt::eval<1, 1, 1> (this=0x4150680) at pair_lj_charmm_coul_long_opt.h:137
137 pair_lj_charmm_coul_long_opt.h: No such file or directory.
in pair_lj_charmm_coul_long_opt.h
Current language: auto; currently c++
(gdb) bt
#0 0x0000000000617445 in LAMMPS_NS::PairLJCharmmCoulLongOpt::eval<1, 1, 1> (this=0x4150680) at pair_lj_charmm_coul_long_opt.h:137
#1 0x0000000000612c8b in LAMMPS_NS::PairLJCharmmCoulLongOpt::compute (this=0x4150680, eflag=1, vflag=<value optimized out>)
at pair_lj_charmm_coul_long_opt.cpp:39
#2 0x00000000006b363e in LAMMPS_NS::Verlet::setup (this=0x414ffe0) at verlet.cpp:112
#3 0x0000000000691b39 in LAMMPS_NS::Run::command (this=0x7fff88b923c0, narg=1, arg=0x413f740) at run.cpp:173
#4 0x0000000000551ead in LAMMPS_NS::Input::execute_command (this=0x413d7c0) at run.h:16
#5 0x00000000005529a6 in LAMMPS_NS::Input::file (this=0x413d7c0) at input.cpp:195
#6 0x000000000055ac17 in main (argc=3, argv=0x7fff88b92f68) at main.cpp:29

Thanks,
Michael

akohlmey · May 11, 2011, 2:35pm

dear michael,

[...]

I've compiled with both the intel icpc (version 11.1) and g++ (4.1.2) but get segmentation faults in both cases.
I've also tried compiling at a lower level of optimization (-O1, -O2 etc) using the intel compiler to the same effect.

ok.

can you try to confirm which older version of lammps does work
and where exactly it is breaking?

I can confirm that the optimized functions are working using the Mar 15th version.

good. that would indeed point to a recent change.

Running the serial version through gdb gives:
...
PPPM initialization ...
G vector = 0.281342
grid = 24 24 24
stencil order = 5
RMS precision = 4.47383e-05
brick FFT buffer size/proc = 29791 13824 11532
Setting up run ...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000617445 in LAMMPS_NS::PairLJCharmmCoulLongOpt::eval<1, 1, 1> (this=0x4150680) at pair_lj_charmm_coul_long_opt.h:137
137 pair_lj_charmm_coul_long_opt.h: No such file or directory.
in pair_lj_charmm_coul_long_opt.h
Current language: auto; currently c++
(gdb) bt
#0 0x0000000000617445 in LAMMPS_NS::PairLJCharmmCoulLongOpt::eval<1, 1, 1> (this=0x4150680) at pair_lj_charmm_coul_long_opt.h:137

the code around line 137 is:

if (j <= NEIGHMASK) {
    double delx = xtmp - xx[j].x;
    double dely = ytmp - xx[j].y;
    double delz = ztmp - xx[j].z;
    rsq = delx*delx + dely*dely + delz*delz;

this looks a lot like it got broken when
steve was changing the neigborlist flags.
in fact, the code cannot work.

please change the if statement to:

if (j < NEIGHMASK) {
double delx = xtmp - xx[j].x;

and see if that would work.

...and let us know.

thanks,
axel.

Michael_Mitchell · May 11, 2011, 2:59pm

Dear Axel:
[...]

the code around line 137 is:

if (j <= NEIGHMASK) {
double delx = xtmp - xx[j].x;
double dely = ytmp - xx[j].y;
double delz = ztmp - xx[j].z;
rsq = delx*delx + dely*dely + delz*delz;

this looks a lot like it got broken when
steve was changing the neigborlist flags.
in fact, the code cannot work.

please change the if statement to:

if (j < NEIGHMASK) {
double delx = xtmp - xx[j].x;

and see if that would work.

...and let us know.

thanks,
axel.

Thanks for the suggestion but that did not solve the problem as I'm still getting segmentation faults.
I attached this bit of code:

printf("i: %i j: %i NEIGHMASK: %i xtmp: %lf\n",i,j,NEIGHMASK,xtmp);

before line 137, and this is the result:

...
i: 0 j: 1895 NEIGHMASK: 1073741823 xtmp: 2.517980
i: 0 j: 1901 NEIGHMASK: 1073741823 xtmp: 2.517980
i: 1 j: -2147483646 NEIGHMASK: 1073741823 xtmp: 1.629650

As you can see, the problem arises the 2nd time through the i-loop. Somehow the jlist array no longer points to the correct memory location...

Thanks,
Mike

sjplimp · May 19, 2011, 5:35pm

I just released a 19May11 patch that should fix this.
Please try it out.

Thanks,
Steve