fix external callback bug and proposed solution

Hi LAMMPS Community,

I found a bug in the fix external feature of LAMMPS. The trouble is with lines 113 and 119 of fix_external.cpp, which are respectively:

if (mode == PF_CALLBACK && ntimestep & ncall == 0)

and

if (ntimestep & napply == 0) {

The problem is with the bitwise operations. I think they should actually be modulus operations, but the bitwise & is not equivalent to modulus. My proposed solution is to change these lines to:

if (mode == PF_CALLBACK && ntimestep % ncall == 0)

and

if (ntimestep % napply == 0) {

Here's my reasoning:

1.
The documentation at http://lammps.sandia.gov/doc/fix_external.html suggests that if I use the following command in my LAMMPS input script:

fix 1 all external pf/callback 1 1

then my external function should be called once every time step. However, I found that this was not happening. The logical test mode == PF_CALLBACK is always true when I include the above fix command in the LAMMPS input script. But, the bitwise logical tests are not true when they should be. So, I looked into it more.

2.
I don't think we should be making bitwise comparisons between variables of different bit size. ntimestep is a bigint (64-bit integer) and ncall and napply are int (32-bit integer). To test, I put in the lines

  printf("sizeof(ntimestep) = %d\n", sizeof(ntimestep));
  printf("sizeof(ncall) = %d\n", sizeof(ncall));

which yield

sizeof(ntimestep) = 8
sizeof(ncall) = 4

Does it really make sense to do a bitwise comparison between bigint and int? What happens with the 32 bits that don't match up? I'm not sure. Anyways, I made ncall and napply a bigint, but that still doesn't fix the bug. (Note: I know that bigint can be 32-bit depending on machine. I'm working on a 64-bit machine).

3.
With ncall = 1, ntimestep & ncall will only equal zero every other step, not every step, because it will only have a '1' in the 2^0 place on every odd ntimestep. So, the logical test would only be true for even ntimesteps. But weirdly, for the following print statement

  printf("ntimestep : %ld ncall : %ld ntimestep & ncall : %ld (ntimestep & ncall == 0) : %ld \n",
          ntimestep, ncall, ntimestep & ncall, ntimestep & ncall == 0);

I get this output

ntimestep : 2 ncall : 1 ntimestep & ncall : 0 (ntimestep & ncall == 0) : 0

I don't understand why the logical test isn't true here (i.e. evaluate to 1). However, when I cast the binary comparison as type int, it comes out as expected:

  printf("ntimestep : %ld ncall : %ld ntimestep & ncall : %ld (ntimestep & ncall == 0) : %d \n",
          ntimestep, ncall, ntimestep & ncall, (int)(ntimestep & ncall) == 0);

Output:
ntimestep : 2 ncall : 1 ntimestep & ncall : 0 (ntimestep & ncall == 0) : 1

4.
Well, all of this is kind of superfluous because I'm pretty sure that it's the modulus operator we want. One solution is that we can do the modulus with bitwise by doing (ntimestep & (ncall-1)) and then cast it as an int. That actually works! But, is it even worth doing all that here? Seems overcomplicated. Why not just go with the old % operator? I don't think LAMMPS will take a performance hit if we omit the bitwse operations in fix_external.cpp. So, that's why I proposed the above solution.

Best luck,
Adrian

agreed - that's a bug - should be using the % operator

I think what happened is we added some options
to allow for the forces to be applied at a different step
frequency and/or by pf/apply, and introduced the bug

Will be fixed in next patch.

Thanks,
Steve