Minimization for MSD calculation Suggested by AI

mbazorjc · December 23, 2024, 7:14am

Given that we are now in the age of AI and the potentials that this presents. I took the plunge to ask AI about potential errors and possible corrections in my MSD code for high entropy alloys and the results is as attached.

I am trying to reuse the code for practice and potentially apply it later
Question:

Regarding minimization, how can we test/know that the minimization is giving the required result?
What is the options to the fix OPTIMIZE code, what else (other fixes or lines) can be in lieu of it?
Is it okay to remove the velocity line entirely and replace with thermostat for equilibration
What information can we use the output of the minimization result for? and how does this compare with results from ab-initio calculation with VASP or Quantum Espresso?

evoyiatzis · December 23, 2024, 7:53am

There is no fix optimize in LAMMPS. The list of all available fixes can be found here
https://docs.lammps.org/fix.html

Similarly there is fix npt but not fix fixnpt.

It is better to read a standard textbook like Allen & Tildesley or Smit & Frenkel than to rely on AI. You can also find examples for computing diffusion in the lammps distribution (lammps/examples/DIFFUSE at develop · lammps/lammps · GitHub)

HTH, Evangelos

mbazorjc · December 23, 2024, 8:07am

Thank you for your reply. This sure helps. Still need clarification on question 4 -
I was goin to say the OPTIMIZE was just an ID name but it was used with other arguments signifying the intended use was for relaxation.
If you have time help with question 3 and 1.
How do we differentiate minimisation from convergence? are they thesame?

And sure the advise is well heeded as i am being guided by textbooks and humans too. I appreciate the response from the forum

evoyiatzis · December 23, 2024, 11:10am

This is a paper and a lammps compatible tool you should add to your list https://pubs.acs.org/doi/10.1021/acs.jcim.9b00066
You can get a lot from the theoretical discussion there.

The questions Q1 and Q4 are too broad to be answered in a forum.

Regarding Q3 if you are using fix nvt or npt in your script as AI suggests then you are using a thermostat. You should make sure that at least there is no net momentum translating the entire simulation box. Thus the option zero linear in velocity command is very useful.

mbazorjc · December 23, 2024, 2:27pm

In the midst of your esteemed engagements, you found time to answer my questions, this shows a great depth of commitment to good science to which i owe alot. I really appreciate the time and effort you put into answering my questions. Will pay it forward

akohlmey · December 23, 2024, 3:02pm

There is a big difference between using machine learning (or AI) based potentials and using an AI tool (usually based on some large language model (LLM) parameterization) to create a LAMMPS input file. The problem with the latter step is that the AI tools do not know physics, do not understand side effects and semantics of MD simulations, and have not been told which information they use is good and which is bad (e.g. when processing data from the LAMMPS forum or the mailing list archive, there are lots of examples with errors that people are asking about). As a consequence, you get a lot of so-called “hallucinations”, i.e. the AI will generate statements that look reasonable but are complete and utter nonsense. In the example output you are quoting, you get a lot of trivial and self-evident recommendations and obvious cases where the AI doesn’t even understand the syntax of the commands it is referring to (e.g. fix box/relax in point 4 or fix npt in point 8).

The bottom line is that - unless there is some major improvement of the training and algorithms for these LLM based AI bots - their use to create and provide information about LAMMPS input files will be more distracting and incorrect than it will be helpful. Now, since AI tools provide their answers with confidence and often use rather simple language, their information is more accessible and it is very tempting to use that instead of learning things the hard way and reading usually more complex documentation. Yet, my experience from 30 years of MD simulations is that there are no shortcuts. This has to do with the fact that doing simulations is lot more like a craft (e.g. carpentry) than it is pure science and thus the practical knowledge and experience that you can get from working with a competent practitioner is as much required as you cannot learn carpentry from a text book alone, and you cannot start on building advanced/complex objects without first getting your hands dirty and learn the various techniques and understand the principle of carpentry with building simple objects.

mbazorjc · December 24, 2024, 7:20am

Your expertise, contribution and patience have been invaluable to the LAMMPS community. Thank you for being such a reliable and knowledgeable presence. We are truly fortunate to have you as a part of our forum. I look forward to your continued support.

srtee · December 25, 2024, 11:27am

From a computer science point of view, it is indeed quite remarkable to be able to programmatically produce arrangements of words which are grammatically sensible and approximately resemble a domain expert’s advice.

But, to explicitly comment on each of these suggestions:

Point 4 essentially says “ensure that you actually want your box size to change during minimisation”. This is a sensible suggestion but note that the LLM hasn’t actually told you whether you should use fix box/relax.

This is a classic piece of “Barnum advice” where, if I give you an exhaustive list of alternatives and remind you that you need to choose one of them, you will think I have given you valuable advice when in fact I have simply stated a tautology. “To be, or not to be?” sounds profound, but it’s not like there’s a third option available.

Point 5 is not particularly useful – if your system doesn’t converge with the standard conjugate gradient it’s even less likely to converge under steepest descent, which is generally known to converge more poorly and slowly (although maybe I simply haven’t done much minimisation before).

Point 6 is also not useful – velocity create by itself is almost never sufficient for equilibration. “Make sure you set the correct temperature” is almost insultingly trivial. “And choose the correct random seed” is nonsense!

Point 7 is again insultingly trivial. If you did not specify the correct group for computing MSD then what would you have specified correctly?

Point 8 is actively harmful – in a typical molecular dynamics simulation, you will be studying small systems over short time scales whose pressure and temperature will severely fluctuate. Strong damping that “smooths” the pressure and temperature variation would likely distort the equations of motion so much as to invalidate any diffusivity calculations.

mbazorjc · December 25, 2024, 11:57am

Thank you so much @srtee .Thanks to the age of RAG AI/LLM, still whelmed by what it is capable of doing and i am confident without loss of generality that our community stands to benefit some bit from it

Sure this is the point of digesting the documentation properly in the course of the carpentry (in @akohlmey’s words)

From experience and literature searches, i am informed The thermostat does thesame thing

Isn’t that the purpose of the thermostat: to keep the system at a thermodynamic level to achieve sufficient mixing at a given temperature and other fixed constants (thermostat arguments) in the ensemble?..
Perhaps we need to expose the LLM more to the more best practices for it to learn for better result - the purpose of RAG🤷‍♂️

akohlmey · December 25, 2024, 11:59pm

I strongly disagree with that statement for multiple reasons:

there is too much low quality data available, e.g. publications that are not reviewed by domain experts, or blogs and websites or videos that have no requirement for review at all. Also, one has to factor in, that people how know the craft of MD simulations well, do not post many questions online, so the discussion topics in forums are often riddles with misconceptions or errors and it is often difficult to make people understand where they are missing fundamental understanding.
the available data - like all data on the web - has a tendency to become outdated rather quickly and other information just vanishes. for example, every time I update my home page some information gets lots, and when I don’t update it, information becomes outdated. When you move to a new job there is a risk of data being lost. Very few places keep archives and legacy data around, but that data is, of course, never updated and if somebody switches to a different business, even very useful things may get abandoned.
there is a lot of “residual knowledge” in experienced research groups that is passed around inside these groups, but rarely written down. This often has to do with experience and that there are often no “do this, not that” kind of rules, but good choices can depend strongly on (many) circumstantial or system specific properties.
the basic principle of AI/LLM is to synthesize a probable response, but there is no enforcement of the basic principle of physics. The classic example for LAMMPS is the mixing of units settings. Many simulations can be done with very different units settings and then require different parameters, but AI bots have no concept of that, and even for a human it is not easy to spot inconsistencies without being a domain expert.

If you summarize the answers you posted, they all basically say: you have to know what you are doing. It doesn’t need an AI to know and understand that. But for some strange reason people are far more susceptible to accept this kind of statement when it comes for an AI bot than when a domain expert says this, let alone, it is written in a manual.

No.

This is an oversimplification, but it requires a significant understanding of statistical mechanics to know and explain the difference. Here in the forum (and many publications, too) we are often confronted with people that do not understand what an ensemble is in the fundamental statistical mechanical sense. You can still produce meaningful results, even if your description is flawed or your understanding is limited. However, it can then be quite difficult to be certain. Experienced MD simulation users know their limitations and will not engage in projects outside their comfort zone unless that recruit competent collaborators.

For that you first need people to actually make these best practices data available. However, that is non-trivial:

since it is not data of the “do this, not that” kind, but rather of the “it depends” kind, that are contradictions and it is extremely difficult to make statements that are globally true (except for very trivial ones)
there is no incentive to make this information available. in fact, any research group that treats the best practices in their specific research domain like a trade secret has an advantage over competitors
in fact, making detailed data available includes the risk of being corrected. given the competitive nature of research there is always a temptation to cut some corners, usually without impacting the correctness of the results, but who can tell up front? and there is a lot of reputation damage (and because of that financial damage when competing for funding) at stake.
writing good and accessible documentation takes time. usually, this is left to the developers, but they are not the best people for the job. rather it is the users that collect the practical experience. however, I have seen extremely few during my 30+ years career that are willing to spend the extra time. usually, they claim it is not their job (but that of the developers) and they rather spend the time on the next research project.
I personally view free and open source software as something that comes with a “social contract”, i.e. for being able to use open source without having to pay with an obligation to contribute to the project rather than paying for a license or user support. as a project grows and becomes more popular and has more demand on accessible information about how to correctly learn and use it, the fraction of people that feel an obligation to this “social contract” is shrinking.