I have been testing the self-consistent phonons in Hiphive and found a puzzling behaviour. I’m not sure if this is a science problem or a code problem!

A script and Dockerfile to reproduce the behaviour are here, along with a couple of the output plots in the README: GitHub - ajjackson/hiphive-sandpit: Container for testing some phonon methods with hiphive
If you have all the dependencies installed (including tqdm for progress bars and ASAP3 for fast EMT) it should be fine to run the script outside the container; it doesn’t rely on anything strange. I tried to make the output presentable and the key parameters adjustable, so do have a play with it

For EMT simulations of aluminium, I get opposite thermal softening/hardening behaviour depending on whether the samples are drawn from MD or from phonon rattling. I wondered if this could be due to under-convergence of the self-consistent phonons, so tried running a single-shot phonon rattle with the force constants from MD. Surprisingly, this brings the results quite close to the self-consistent phonons - yet the average and maximum displacement values are very similar to the MD. So it seems unlikely that the self-consistent phonon convergence is driving the difference.

The MD seems well thermalised but I suppose it is possible that this is under-sampled? I think for this high-symmetry system with only a few parameters, 50 samples of 4x4x4 cubic supercells should be a pretty generous amount of data to work with? I have tried running 10x longer MD simulations with and without drawing 10x more samples and there doesn’t seem to be much impact.

With gold, the thermal effects trend in the same direction across sampling methods but there is about a factor 2x in the frequency shift from 0K between methods.

Given that temperature-dependent effective potentials from MD and self-consistent harmonic potentials are being used somewhat interchangably (notably, the developers of the TDEP code seem to be moving their interest to the latter) it is surprising to see such a difference in the results. Has anyone seen similar behaviour? What could be driving it?

I’ll try to find time to take a gander but spontaneously I don’t find it super surprising since tdep style tends to underestimate frequencies and scp to overestimate frequencies. This can be found in the literature for example in https://www.nature.com/articles/s42005-023-01297-8
but other authors have looked at this effect in more detail also.

I can add that this is because scp is based on a free energy minimization procedure while tdep (fitting to MD) is the opposite.

Hi Adam,
I think your results are in good agreement with the expected behaviour, that SCP overestimates frequencies and harmonic models from MD underestimates frequencies. As Fredrik said, this trend is something we’ve seen consistently in quite a lot of examples.

In the paper linked above by Fredrik the effective harmonic force constants from MD seem to work better for the soft tilt modes in perovskites compared to SCP. But for other modes in this system the opposite is true.
I have no insights for what type of modes or systems one method would work better than the other (other than QM effects being straight forward to include in SCP).

…with the general trend being SCAILD predicting an increase in the
phonon frequencies and higher free energy than the exact value, as opposed to
TDEP, which predicts decreasing frequencies and lower free energy.

Let me just add to this: Methods relying on free energy minimization do exactly that, i.e., they approach the free energy from above and tend to overestimate frequencies (SCAILD/SSCHA/SCP/sTDEP).

Methods relying on force fitting on MD sampling can be shown to maximize the trial free energy, i.e., they approach the free energy from below and therefore tend to underestimate the frequencies (TDEP).

Generally, the former methods are self-consistent, while the latter are not.

sTDEP is the self-consistent version of (MD-based) TDEP, which should be very similar to SCP/SSCHA, (q)SCAILD besides some technical subtleties.

Thanks all, I will do some reading! Some further thoughts that may well be explained in those papers:

Of course we expect some differences as sampling the full anharmonic PES and sampling a harmonic-approximated PES are not quite the same thing. And there seems to be some alignment with the “approach from above” expectation in the “1-shot rattle from MD” coming in above the self-consistent rattling.

I am troubled that in this case the methods disagree so strongly on the frequency-vs-temperature trend: it seems to make predictive work impossible. Are the methods telling us anything about anharmonic frequency shifts, or are they just making increasingly poor estimates of the zero-temperature behaviour in an arbitrary direction?

While you’re here, Florian, maybe you can clarify something about TDEP

It occurs to me that my test runs are taking regular samples from the NVT MD; so in theory they are a canonical ensemble but maybe not the best one. If I recall correctly, “old-school TDEP” would carefully choose a canonical ensemble of samples from the MD input. Are these approaches equivalent (in practical and/or converged limits) or is there a further detail I’m glossing over?

Here is the temperature evolution plot series so people don’t have to run the code:

Note that the “phonon frequencies” you are plotting are not physical phonon frequencies. It would be interesting to compare the peaks of the spectral function (including 3rd order shifts) and how those shift with temperature.

Re canonical ensemble: Well, you can still use Samples from MD - TDEP but I doubt it makes a big difference if you have a properly sampled MD in the first place.

Yes. As far as I am aware all the commonly-used names of this method

the direct method

the supercell method

finite displacements

[insert name of implementing code here]

have some scope for ambiguity/confusion

I think “direct” makes the clearest distinction from sampling-based methods, but is still perhaps a bit misleading if one goes on to apply a lot of symmetry corrections, sum rules etc.

Note that the “phonon frequencies” you are plotting are not physical phonon frequencies. It would be interesting to compare the peaks of the spectral function (including 3rd order shifts) and how those shift with temperature.

Indeed, this is really the observable I am interested in. But the concept of phonon bands is a useful one, and it would be nice to have an effective harmonic model that puts the peaks in the right place - even if that means compromising on the other statistics.

I think the problem here might be that there is no definition of “right place”, is it the mode, mean or second moment of the spectral function the most correct? For a simple, classical case it is possible to show that tdep (from MD) is exactly the second moment of the spectral function.

Right! Comparing spectral functions is hard work especially when they have realistic amounts of noise in them. When comparing with experiment the easiest thing to look at is the mode, and none of these methods promise to produce that.

But I would be surprised if the difference between these averages was as large as the spread between those plot lines. I think the next step is to get the spectral function with TDEP and we can see if that is the case

It will also be interesting to look more closely at the actual displacements being sampled. There is no difference in how the potential is being fitted, and superficially the displacement sets seem very similar. Yet they must be the source of the difference in results.

Here is an example of the dynamical structure factor obtained from MD compared with effective harmonic models dynasor_frequency_comparison_PBE.pdf (86.4 KB)
where “TDEP” refers to fitting harmonic models using MD data.

Yes the answer lies in the displacements
Eventhough the displacement magnitude is the same, the displacement correlation between atoms probably differs, and probably you can see a slight differences in the interatomic distances (e.g. the RDF).

We did some testing a long time ago, here, I dont know how well converged this is, but possibly the distribution of forces in the phonon-rattle snapshots have a slightly longer/heavier tail compared to MD (so maybe something interesting going on with atoms in this tail?)

It looks like my results are quite similar those of Korotaev et al. (2018) Redirecting, who also looked at Al when comparing TDEP, SCAILD and MD correlation functions using the EAM potential in LAMMPS. So the issue is known for this exact system! Thanks for linking this paper @erikfransson

I think that confirms there is nothing “wrong” in this implementation/application of hiphive, and the differing behaviour is a consequence of the various modelling approaches. Clearly, the renormalised harmonic frequencies from these schemes should be used with a bit of caution (and the more sophisticated analysis methods have a purpose!)