Running pair style dsf on GPU blowing up

Hi Luis,

I can produce the temperature blow up with dsf/gpu, and also with coul/long/gpu + pppm. When I used pair hybrid with coul/dsf and table/gpu, the run seems fine in my test. When I switched to using fix nvt instead of fix npt, the gpu runs didn’t cause the temperature blowup anymore, at least in my short runs. Can you also test with fix nvt to make sure that is the case?

This might have something to do with some gpu styles on your particular system with a triclinic box and running npt. I am still narrowing the issue down, hopefully someone on the list can quickly have better ideas.

Best,

-Trung

Hi Trung,

I’d like to help you debug this problem. It may be related to a previous issue with GPU + NPT that another user and I came across. (see thread http://lammps.sandia.gov/threads/msg42046.html)

I set up a (somewhat) simple example script that leads to similar problems using the rhodo example that’s distributed with lammps. I’m attaching the input script. It produces different results for CPU or GPU. The correct results can be obtained by setting the thermo output to every timestep for the GPU simulation. I think the issue I pointed out in the earlier email still stands, changing the frequency of the thermodynamic output should not change the final answer.

Cheers,
Mike

in.gpu.rhodo-ex (825 Bytes)

Thank you all for your kind attention on this matter.

Trung, switching to NVT did work for me as well. There are some conditions
on my simulations that it will be useful to use GPU and a fixed volume.
Once again, thanks!

Best,
Luis

Hi Luis and Mike,

thanks for reporting the issues and input files.

Can you rebuild the GPU package with the attached lal_answer.cpp file? (i.e. save a copy of the current lal_answer.cpp, put the attached file into lib/gpu/, make clean and make -f Makefile.your_machine to build libgpu.a). After that, remove the current LAMMPS binary in src/ and rebuild LAMMPS.

I think I fixed a bug with accumulating virial with the GPU pacakge when energy is not accumulated (eflag == 0 and vflag != 0) for the GPU pair styles that require charges. I would like to have both of you check with your systems to make sure that the bug fix works. I used your input files comparing GPU double precision vs. CPU runs, and they all match in my short runs.

Let me know your results.

Best,

-Trung

diff …/lib/gpu/lal_answer.cpp:
238a239,240

vstart=iend;
iend+=_inum;
240,241d241
< vstart=iend;
< iend+=_inum;

lal_answer.cpp (7.38 KB)

Hi Trung,

  I still can't match the GPU double prec and CPU results for pair dsf and
ewald. Minimization run matches 100% with dsf. T is not blowing up
anymore but is still very high and tilted the box too far.

  Let me know if you need more information, ok?

  Thanks!
  Luis

Hi Trung,

Thanks for looking into this. I also don’t find that the new lal_answer fixes the discrepancies. I think the key (for me) is that if thermo output frequency is set to 1 (thermo 1) the correct pressures, box sizes, etc are produced with the GPU code, but if the thermo output is less frequent then discrepancies arise. This is true before and after the modification you made. BTW, Are your tests using thermo output at every timestep? If so, you may not see a problem.

Luis:
If you set the thermo output frequency to every timestep do your problems resolve?

output from final timestep of in.gpu.rhodo-ex using CPU, GPU thermo 1, and GPU thermo 10 with new lal_answer.cpp double precision:

log.lammps-cpu-new: 40 547.02702 -21459.606 6148.4154 6436.2229 6636.2306 5372.7928 55.098347 77.137686 72.661893
log.lammps-gpu-new-1: 40 547.02697 -21459.576 6148.4206 6436.2284 6636.2357 5372.7978 55.098347 77.137686 72.661893
log.lammps-gpu-new-10: 40 548.1988 -21386.459 6989.7806 7214.6177 7429.7264 6324.9976 55.022917 77.032083 72.562418

You can see the box sizes are correct with thermo 1 but incorrect for larger thermo frequencies.

Mike

Hi Mike,

  Yes, setting 'thermo 1' works for me too. I find it really strange
though why this is so. Other means to compute the thermodynamic
variables every timestep, like fix ave/time, won't do the same trick.

  Luis

Yes, I’ve also found that fix ave/time doesn’t work properly with some quantities for a GPU enabled run. Actually, I suspect it might if the thermo output is set to one (although I haven’t tested).

Mike

I tested with thermo 10. Let me look through the changes again, probably I changed something else in addition to lal_answer.cpp. Will get back to you later.

-Trung

Sorry for my mistake in the previous attempt (I was messed up with the working version of the source file and clean rebuilds). Please build a clean libgpu.a with the attached lal_answer.cpp, and build a new LAMMPS binary. I also attach the log files I got for your input files for CPU and GPU double precision runs as references.

Mike, I made some changes to the input script(s) to reduce the divergence between CPU and GPU runs (such as using gpu force 0 0 1, pair_modify table 0 for coul/long).

Luis, I generated a binary restart file at the end of a CPU run and read in the restart file when comparing CPU and GPU runs (coul/dsf and coul/long+pppm). You may want to generate a restart file using the LAMMPS version available on your system to ensure compatibility.

Let me know how it works. Thanks for your cooperation,

-Trung

in.gpu.rhodo-ex (864 Bytes)

in.cpu.rhodo-ex (861 Bytes)

log.rhodo.gpu (3.97 KB)

log.rhodo.cpu (3.84 KB)

in.restart.cpu (1.4 KB)

in.restart.gpu (1.41 KB)

log.restart.cpu (3.21 KB)

log.restart.gpu (3.38 KB)

equil.restart (510 KB)

lal_answer.cpp (7.43 KB)

Hi Trung,

Initial test indicates that the changes you made do indeed fix the problems I see. Really appreciate your work to resolve the issue.

Regards,
Mike

Hi Trung,

  I think this correction is a great advance. The bug is not entirely
fixed yet. I preformed a fairly long simulation with ewald+gpu+triclinic
and here is the end of the output

  809000 333.34225 -36973.605 39966.265 451.09151 27.926868
  27.909314 28.456647 89.938411 90.009327 89.990579
22179.659 1654.8759
  809050 335.43519 -36973.563 40020.151 431.75694 27.956012
  27.892087 28.460572 89.951396 90.021506 89.984654
22192.164 -1078.5968
  809100 333.65817 -36973.551 40014.308 428.13852 27.953797
  27.891616 28.449777 89.975527 90.01415 89.994431
22181.622 282.64878
  809150 332.04234 -36973.562 39986.98 437.45456 27.913168
   27.90627 28.454137 89.976269 89.993 90.006612
22164.416 2141.9848
  809200 330.68978 -36973.448 40027.701 429.41843 27.897038
  27.929299 28.489762 89.961707 89.985068 89.996356
22197.642 -1593.6
  809250 337.13228 -36973.534 40001.501 433.63789 27.886182
  27.919411 28.4823 89.949552 89.98204 89.97561
22175.332 995.77101
  809300 328.2713 -36973.58 40001.985 439.51607 27.895868
  27.923335 28.482187 89.953323 89.976894 89.965121
22186.062 -211.65456
  809350 332.46844 -36973.393 40000.957 442.90965 27.898332
  27.931231 28.495909 89.974173 89.970192 89.966917
22204.994 -1499.8597
  809400 338.78303 -36973.472 39976.932 441.29619 27.892474
   27.91515 28.484019 89.98606 89.962828 89.985024
22178.295 1951.7042
  809450 255305.48 79583.856 40105.249 3548.095 27.914937
  27.913511 28.83884 89.978172 89.822202 90.002129
22471.227 -3317222.5
ERROR: Fix npt/nph has tilted box too far in one step - periodic cell is
too far from equilibrium state (../fix_nh.cpp:1144)
mpiexec_ipe02: mpd_uncaught_except_tb handling:
  <type 'exceptions.IOError'>: [Errno 5] Input/output error
    /opt/mpich2-intel/bin/mpiexec 1073 handle_cli_stderr_input
        sys.stderr.write(msg)
    /opt/mpich2-intel/bin/mpdlib.py 780 handle_active_streams
        handler(stream,*args)
    /opt/mpich2-intel/bin/mpiexec 530 mpiexec
        rv = streamHandler.handle_active_streams(timeout=1.0)
    /opt/mpich2-intel/bin/mpiexec 1446 <module>
        mpiexec()

  Most of the gpu accelerated scripts ends like that. I don't know if
helps but this error always comes up near the last phase transition
(tetragonal to cubic). Perhaps strong fluctuations of the box could be
leading to this kind of error.

  Let me know if you need anything else, ok?

  Best,
  Luis

Hi Luis,

did the CPU runs also fail when they get close to the phase transition (not necessarily at the exact time steps)? Can you restart the simulation from a restart file close to the point it crashed (e.g. at t = 800000 for the run you were referring to) and run with and without GPU to see if the problem occurs only with the GPU run?

Also, when you mentioned running with ewald+gpu+triclinic, you are using coul/long/gpu + kspace ewald, yes?

Cheers,

-Trung

Hi Trung,

  No, the CPU version runs ok at all times. I will try to restart with
both methods close to the phase transition.

  Exactly, I used coul/long/gpu with kspace ewald. With my card, the GPU
speedup is close to 3x. Not bad at all, in my opinion.

  Cheers,
  Luis

Hi Trung,

  I restarted one of the crashed GPU runs without the GPU and the
simulation is going fine for the pair_style dsf. Restart began at 950000
and it is running way past the crash point.

  As I said before, CPU runs fine during the phase transitions.

  What do you think it might be happening?

  Cheers and a happy new year!
  Luis

Hi Luis,

thanks for trying the without GPU run from restart. Have you also tried restart with GPU? If the simulation keeps crashing, then the reasons are not obvious to me, at least for now. You may try more defensive values for neighbor list rebuilds (for example, every 1 delay 10 check yes) when it gets closer to the phase transitions to see if it helps.

Best,

-Trung