On the current state of molecular simulation softwares and teaching

Germain · September 13, 2024, 9:54am

Dear all,

this topic is something that is bugging my mind for some time now and I would like to get some feedback from the small community that this forum is. As this is just me ranting and asking for other people opinions, feel free to ignore if you’re not interested.

To give some context, I’ve been trained in molecular simulation (MS) “the old way” with a general statistical physics background. By “old way”, I mean I worked on the MD and MC code developed in-house by my labs during my PhD, both of them written in Fortran, just like (most of) our post-processing codes. We even plotted data using xmgrace, and I was the one who doomed some of our traditional workflows by introducing Python scripting in the lab. But I knew the Nosé-Hoover and MC equations nearly by heart and some of my favorite statistical physics books were the ones from Hoover.

As weird as it might sound, I discovered a lot of the advanced state-of-the-art software and techniques after completing my PhD, like Metadynamics, collective variables, advanced coarse-graining protocols etc, some of this nice work developed by the same people here on the forum. Modern machine-learning is baffling compared to what I used to do. Some of these methods rely on heavy thermodynamics or statistical physics analysis so I am not surprised I sometime have a hard time wrapping my head around it, some of them I still do not quite fully understand.

Now I have the feeling that I am becoming an old fart in the sense that there is a very wide variety of software around. I used VMD for a long time and switched to OVITO, I am now a proficient LAMMPS user (or so do I think), but I can’t help to wonder if the current state of those software is good for people to grasp MS. Don’t get me wrong, I love the work that is put in these, and I am helping here for a reason, but I sometime feel that even I rely too much on them and that this blunts my creativity with regard to analyses.

For people arriving in the field, in the same way that it is very hard to teach quantum dynamics without some classical mechanics basis (and even then, it is still hard to teach quantum mechanics), I feel as if some of the comfy software masks some of the required basic understanding of what MD is and does. I’ve been quite surprised by seeing people doing material science doing only “0K energy minimization” in MD and comparing computed values with real life material. In the same way, I saw some papers published with only nice “illustrative figures” from OVITO but not much more, as if quantitative analysis was not useful, necessary or even obtainable from simulations. It is also sometime given with the bare minimum, and other time computational information, like forcefield, are barely present at all. This makes me actually go “WTF?” sometimes. Would anyone find acceptable to publish an experimental paper with only an illustrative picture of there sample? I know that the publishing systems has quite some issues with peer review, greedy publishers and the astronomical number of papers published (most of them ending up being NOT read), but I can’t help but feel that there is also some issues in the way of doing/presenting MS.

I have a feeling that some people are leaved on there own into doing some MS without proper training, and just taking the software, sometimes barely reading the doc and “fiddle around” as best as they can. There is, as I said, a lot to know and master to do proper MS nowadays. So are there still dinosaurs teaching the basics to newcomers? Are they still relevant? Is there any way to hop in directly using new tools and technique without missing some relevant parts?

Now I know that there is, in science training, a lot of “learning by doing” and getting proper guidance from teachers you copy the workflow/thinking process of. Beside @akohlmey saying it (rightfully) a countless number of time, I agree with Michael Polanyi^[1] on this topic. But I wonder about other people opinion here on the current state of MS understanding given the current state of software development, if my feeling is shared or if that’s not the case.

As a side note, I know that people posting on this forum represent a biased sample with a distribution having, I suppose, high density at the two ends of the “beginner to proficient user” spectrum. And this is why I am suspending my own judgment on this question.

Who just said essentially the same thing but in BIG BOOKS with a lot of pages and footnotes. ↩︎

ceciliaalvares · September 13, 2024, 5:58pm

Hey Germain,

So, first, I share a lot your opinion about how people go do the stuff without knowing the fundamentals - not only on molecular dynamics, but for many other things. I think that there are many factors that contribute for this to be the case. For me not to get too blabla (ops, too late) and avoid making too much girl drama, I will focus on what I think are the main ones

(1) the way the educational system is structured:
Most professors are part time researchers. They have 843902849032 things to do, students to supervise and etc other than being there teaching. And usually people dont really like teaching, bcs it is not exactly the status of the year. It is much “cooler” to be the researcher and get approval of your peers about how excelent you are. There are people that actually put PhDs in charge of helping people doing the internship of the masters instead of going there themselves to help the person to improve. They dont care if the guy in the internship is learning, but just put them to compute thermal conductivity of a sphere of a Cu-Fe alloy in 20 dimensions using equations xxx and xxx and that’s it.

And this is usually much more “caring” than the attention the students in a classroom get. For teaching, I feel like people just take slides from some conference they went to talk about their work, get some images, put into some slides without thinking it through very much and not having much regard about the order of the reasoning either, and that’s it. If it is a very fundamental course where they cannot do this (like inorganic chemistry, let’s say) they either (a) find a way to throw their research in the middle of an inorganic chemistry course (which becomes some sort of “application course” instead of learning fundamentals" or (b) do minimal effort to prepare the material, just for the person to know the basics so that they dont spend their precious time preparing stuff. And a lot of times they are going to put a bunch of equations and photos in the background as the student was already “a grown up researcher in the topic” and could totally grasp that equation with 100 characters that came out of absolute nowhere. But learning how to derive and explain it well takes time from research, no? So unless you are very sharp in your head about that equation, you may just say “it’s not important” or “look in the book and if you ahve questions you can ask me”. Not to mention that powerpoint < black/white board always in my humble opinion.

I hear also that it is often the case that professors dont pass a “examination” to lecture on topic X (at least here in France), but rather pass an examination to teach a class of topics and they initially “get to teach a course on whatever is left”. This is horrible. You need to have professors that are able to teach best the given course, and there should be a yearly competition between permanents to see who teaches this (so there should be a chance of being “fired” from teaching a course or being a professor for an academic year as someone better pops up). Imagine if I get “organic chemistry” simply because I am a chemical engineer and no one else wants to get it? I am 0 competent to teach organic chemistry. It has been 489032840392 years that I dont see it and it was never a particular topic I cared so much into developing myself. Ofc I could “prepare some slides” and get it done, but will it be done beautifully?

(2) people confuse the functionality of memory
People confuse “learning how to execute a task” with “learning what you are doing”. A lot of people know to derive and integrate a function of any possible form and maybe they train themselves to derive 100 different functions in 5 minutes. Sounds very impressive, right? But they have no idea of the meaning of this.

Sure, you can always memorize empty phrases to explain some things. You can go repeating things such as “integrating a function over an interval is equal to the area underneath its plot”, but this is a completely different thing than understanding why that is. But it takes time to learn the fundamentals. Memorizing “the explanation” is different than understanding the explanation. And you need to take that time to devote yourself to understand things from somewhere. But then if you have 10 different courses in a semester and you need to pass an exam, you cannot marvelously make sense of everything unless you have a magical IQ. As this is not my case, I always sacrificed stuff during my undergraduated and decided which topics I was going to truly learn and be very good at versus which topics I am going to “pass the exam” (such as organic chemistry). Doesnt mean I am useless at organic chemistry, but definately I am not competent enough to teach a course or to deeply explain things. Not to mention that I already forgot hwo to synthesize pretty much anything organic by now…

(3) We live in a society that values instataneous results

I dont know if you heard the last news, but the current trend is “dont stress so much and enjoy life !” Work is always a burden. Liking to work is definately not “cool”. Also, no one is going to see you studying in your room at 11 PM to clap you, so you dont get “attention”. It is much cooler to make a quick short on tiktok, which probably will be the source from which students will learn quantum chemistry in teh future in a flash 30s video. There may be not a lot of content but you get many likes. Even in relationships… people abandon each other without fighting and just classify the other one as “toxic” or “narcissistic” because they saw a life coach on instagram give instant advice on feelings. No one spends time with things. Guys dont even ask girls numbers anymore: they ask if they have instagram or snapchat

My advice: live to your morals, do your absolute best of what you can. Always focus on quality instead of quantity. If you cannot have 7 projects, 3 students under your supervision, go to 5 conferences per year, host a workshop and teach 2 courses during an academic year, dont do it. Prioritize. I think that if everyone does this and truly puts time and passion onwhat they are doing, things change.

ceciliaalvares · September 13, 2024, 6:05pm

And actually to be honest with you I am a 0 “molecular dynamics expert”. I only started knowing that molecular dynamics existed in the second year of my masters. But then I had all the fundamentals of differential and integral calculus + classical thermodynamics + statistical mechanics + newtonian mechanics to undertand a lot of things. I never “learnt molecular dynamcs” by “going to do it”. Ofc I have a less fancy background than you: I do not know the Nose Hoover equations of motion as originally developed to sample any of the ensembles (NVT or NPT) by heart. And I am not going to sit here and pretend that if I give a read on that paper I will understand it (because I have given more than one read actually). But if I know statistical mechanics, I can understand if someone puts a distribution of instantaneous values of properties, gets a Gaussian and say that this hints that they are sampling an ensemble, like there is the case of many papers in which they talk about different equations of motion. Also, I can understand why, physically speaking, you would need to add terms in the original Newotnian equations of motion in order to get to sample the NVT or NPT ensemble (although I cant understand why a specific form over another). And many other things, thanks to the fact that I had a backgruond on something else before.

I am far from knowing everything though… I dont understand long range solvers for example. But for now I can get away with treating some parts of the thing as an instruction “use coulombic interactions at this cutoff, with this solver and this precision” and that’s it. With time I go learning it (a bit as you said - “doing it and learning it”), but I think this doesnt qualify as something so drastic as go try to go naively “play with fix nvt and npt and see what I get”.

hothello · September 14, 2024, 5:43pm

I only indulged once in deriving known equations from the fundamental ones, and that was for my master thesis, in 2002. I quickly learned that to advance the knowledge about a certain topic, at some point you have to stop mastering the known and start tackling the unknown.
The problem that Germain is reporting is a progressive lowering of teaching standards and a massification of higher education. The latter has good and bad aspects, but I tend to welcome educating more people, rather than less.

I think that MS is an advanced craftsmanship, which requires mastering physics, calculus, and programming (sprinkled with nerdiness to taste). The availability of advanced simulation and visualisation software on the one hand allows unqualified users to get some results, even without fully understanding the underlying physics. On the other hand, it allows more educated (and careful) users to access advanced methods (for me, that would be the collective variables module in LAMMPS) without the burden of understanding how it is implemented: I trust the review process, and I enjoy doing original science with my new toy.

In this scenario, the best I can do is to share input files, force fields, etc. and this is already an exception, given that most authors chose not to do so. But sharing input files is per se not enough: you can still reproduce bogus simulations if some hideous parameter choice has been made and hidden somewhere in a complex workflow.

I am working on a EU project where we document materials modelling workflows and datasets with semantic technologies, that is, ontologies and triplestores. The idea is to add meaning (aka semantics) to processes and data, and to expose every parameter to scrutiny ^[1], ensuring validation of the choices made.

I hope that this or similar projects will one day enable the creation of a repository to share modelling workflows along with proper documentation of the physics behind it, the approximations made, and the parameters used.

Cheers,
Otello

Ideally, using software agents or some kind of machine learning. ↩︎

Umme_Salma · September 20, 2024, 10:33am

Dear Germain,

Thank you for sharing your reflections. As someone still learning LAMMPS and with a basic understanding of simulations, I can relate to your concerns. I still find many fundamental aspects difficult to grasp, and I would consider myself a beginner or learner in this field.

One challenge I’ve encountered is the lack of supervision and teaching for the foundational aspects of simulation software. A new software is introduced, and the eye-catching visuals and results can often lead junior researchers, like myself, to prioritize producing those figures over truly understanding the underlying physics and chemistry.

I completely agree that what’s missing are structured, basic courses—from the theory behind the physics and chemistry of the problem, to how to use the simulation options and conduct proper analysis. It would be incredibly helpful to have a series of video lectures that cover everything from the fundamentals to producing those fancy results we see in papers. However, the reality seems to be that few have the time for such initiatives, and often the focus shifts more toward productivity and results.

Despite the challenges, I am very keen to learn the deeper principles of MS. Unfortunately, much of my learning has been self-guided due to the lack of structured support or guidance.

Salma

Germain · September 21, 2024, 11:56am

Hi @Umme_Salma, thanks for considering my point of view. I think @ceciliaalvares and @hothello also provided insights to consider on the situation we’re concerned with.

However I would like to nuance you saying that:

Those videos exist already. There are tons of them, with different degrees of details. If you look for something, I’m definitely sure there is some workshop on “teaching how to” some obscure stuff exists and has videos of good quality on the internet but I don’t think they help.

I personally have learned nothing on the long run from video courses on technical stuff. All my knowledge from physics, chemistry, computer use (coding and tools), writing etc come from 3 sources:

direct teaching: someone showed me something I needed and I could ask them to explain in more detail this or that that I didn’t get
fiddling around (and finding out): related to the first point but with a practical aspect to it, that is trying to do something, but more importantly understanding why a solution works and not my previous ~~failures~~ attempts.
active reading: That is actually going on doing the exercise and putting to use what I’ve learned in books, books I could easily look into for information and annotate which is not so straightforward to do with videos.

It is also very hard (at least for me) to maintain momentum and motivation in going through a full video course from start to finish or resuming it when paused. I have a far better memory of stuff I’ve read compared to stuff I’ve listened to or seen.

So overall, I still think that direct teaching is the best way to go, though I acknowledge that this requires resources and is not accessible in many places.

akohlmey · September 21, 2024, 2:59pm

Dear @Germain,

Thanks for sharing your thoughts and worries. I don’t really have a proper answer to the question you are asking, but perhaps some of my observations from the last 30-ish years in the business can help. I’ll add them here in an unordered list as they come to my mind.

Looking back, I would say that the MS community was quite different when I started out. There were far fewer people and they were more “self-selected” in a way to be reasonably proficient in programming. It was necessary, because people mostly used in-house codes and for those the “source code was the documentation”. It was quite common to start your PhD work with implementing a minimal, trivial MD code or some analysis like g(r) or similar, just to get a sense of the matter. It also taught you how to read code and extract knowledge from that in addition to the documentation.
The decline of writing/using in-house codes and the move toward large software packages somewhat correlates with the growing complexity of the problems under investigation. At some point, you cannot maintain and implement all the features needed in-house, and then the logical consequence of that is to modify a package code and import the baseline from there and focus on your research. This puts beginners at a disadvantage, since they are forced to deal with complexity without having a proper training in the basics. PIs often don’t give them the time and spend the effort on training. The “business of science” has become so brutally competitive that there is pressure to put more focus on immediate results (even if low quality) over training and as a consequence it leads to either micro-management or just relaying pressure without guidance.
Some of the trouble with people lacking understanding is the maturity and accessibility of package code. What helps power users to focus on their specific research topic also makes it easier to get started and just follow and adapt existing tutorials or template inputs. You can get pretty far without really knowing what your are doing by just mimicking what you find somewhere and getting lucky. Of course, that just makes folks crash all the harder, once they come across a problem where this kind of “template” is no longer available and you have to build an simulation from scratch.
At the same time, there is so much more pressure to become productive and graduate students today in particular are rarely given the time and liberties needed to get a better understanding. I am personally appalled by PIs behaving like that and treating their students as cheap and exploitable workforce instead of training them to become good researchers in their field. What is even worse is that PIs without much experience in the area often act is if it is the job of the software developers to train people to run the software, while they would never let anybody use their labs or equipment without proper in-house training.
Point 4 also has a negative impact on the quality of tutoring that younger PIs can provide to their students: if you are not given the time to learn yourself, how can you know that this is helpful, when you are in charge? if all you have experienced is being bullied, how can you not turn into a bully when you have to guide somebody?
I keep wondering about what has happened to collaborations? When I was a grad student and post doc, there was a frequent exchange between groups so that one could learn from each other and further the progress of science as a whole. When I look around now I see many “lone wolf” researchers that don’t ask around or collaborate (unless told so), in some cases no even when the expertise is in the same group or on the same campus. What I see and experience now are mostly collaborations between fields where a complex task is addressed that none single group can do on their own, but there is no “junior” collaborator learning from a “senior”.

akohlmey · September 21, 2024, 3:12pm

I can second this. Despite their popularity, videos, especially those on YouTube and similar sites, have very little impact on learning something compared to written text and specifically practical exercises.

Over the last decade I have given many lectures on topics peripheral to writing MS software and learned, that my lectures work better when I can accelerate or slow down to be just at the right speed so that the audience can just so keep up without being too bored or too distracted or too overloaded. This requires that I can see them and their body language. But for the most part, the lectures were only the smaller contribution to the learning effect compared to the hands-on exercises, and particularly exercises where you had to piece together and figure out the details.

I was forced to teach some of the same lectures over Zoom and it was not quite the same. More importantly, I had to reduce the amount of complexity of corresponding exercises so that the students were able to complete them at all.

In summary, videos are good at giving people a sense of having some knowledge, but it stops people from taking notes and really questioning what was presented, something that does happen in real lectures. So what feels like knowledge is in my eyes just “familiarity” and not understanding. For proper understanding one has to work. You need to make mistakes and then have those “a-ha!” moments. You won’t get that from videos.

ceciliaalvares · September 23, 2024, 6:11pm

I havent had the time to sit carefully and read the whole updates on the discussion very carefully yet (doctors are very busy (haha)), but if I am to advocate in favor of “videos”, I would recommend MIT Opencourseware. There are some few courses I have happened to watch and they are excellent. Not to mention that Allan Adams is a (no-longer-in-duty :/) god.

There are some benefits, like being able to pause and ruminate… so you may find it a useful source of information if you like online stuff, @Umme_Salma

No clue if they have stuff about molecular simulations though. I remember watching isolated courses on some topics related to that but I remember not liking very much (there were still valuable information though)

srtee · November 7, 2024, 5:43am

I started muddling through in molecular dynamics by writing a C++ code that calculated the Langevin dynamics of a particle in a dragged harmonic well, as an undergraduate, and by the end of the year my incredible, amazing script was capable of simulating four particles at once in two dimensions!

One day a nice gentleman knocked on the door and asked me for my student number, and once I supplied it, asked what exactly I was doing that was writing several megabits per second across the institute Ethernet and eating half the network throughput. I explained. He explained that I was attempting to write to a network drive and why that was an excruciatingly terrible idea. But, he said, at least it wasn’t pornography.

More importantly, at the same time that I wrote the code, I was also tasked with analysing the ensembles of trajectories I was generating – specifically the probability distributions of dissipated work, a toy version of (what I’d later learn was) the committor across a potential energy surface’s reactive saddle point. The only reason I learned anything that year was because my terrible C++ code and my terrible theoretical analysis proceeded in tandem.

That relationship has been fundamental to all my work since, rather than one technique or another – the relationship between a simulation method (however well or poorly coded) and the results obtained. Which sounds a bit trivial, but the questions here that make me saddest are “how do I simulate [cool property] of [trendy material] in MD?” without first asking “well, should you? Does your material actually care whether a sulfur atom has moved 1.01 angstrom instead of 0.92? Can you even run lab experiments of any relevance, and if you did, would you learn anything from a snapshot of a few nanometres lasting a few nanoseconds?”

So I think – if I did teach molecular dynamics “from scratch”, I would spend a world of time on those central notions of an ensemble (as a probability distribution of configurations) and a trajectory (as a solution of dynamical equations of motion), and not move until my students understand the sheer intellectual bravado of constructing the ensemble expected value and hoping, on the one hand, to connect it to a trajectory average, and on the other to any kind of real world experimental measurable. Each step is an utter feat and it is a daily miracle that molecular dynamics pulls off each of these steps in the first place.