I am new to implementing new codes in LAMMPS and need help understanding some basic things. My questions might seem naive, but I need to ask them to be sure that my algorithm is correct or to make necessary adaptations.
It’s still not clear to me whether when a new fix is declared for the number of natoms, the index starts from 0 or 1 and should be based on the processor’s local ID or global tag of the atom?
How to access the fix value of an atom outside of a subdomain? I checked that the fix value I need in my code is updated in each time step using the local ID in another file, so I need to access it based on its processor’s local ID. However, for some atoms, atom->map() gives me -1, which means those atoms are out of the processor’s subdomain.
I would greatly appreciate it if you could answer my questions or guide me in the right direction to find my answers.
Arrays of the dimension natoms have to be avoided at all costs. The necessary communication to synchronize them across all processors and the increased memory consumption massively interfere with parallel performance and total memory use. LAMMPS has a distributed data model. For more details on that, please study the recent LAMMPS paper and the Programmer’s Guide section of the LAMMPS manual. Per-atom data in LAMMPS is stored per sub-domain in arrays that either of dimension nlocal, nall=nlocal+nghost, or nmax (i.e. the max of nall across all subdomains). Inside those arrays the order is arbitrary (and atoms get reordered for spatial proximity regularly for improved performance). The atom ID is a per-atom property and stored in the Atom::tag array.
You cannot directly, you will have to communicate it. If the maximum distance is not too large and can be predicted, that you can increase the communication cutoff accordingly. And then you have to do a so-called forward communication to update the information for the ghost atoms of the surrounding subdomains.
For learning more about LAMMPS parallelization and available communication patterns, you may want to listen to the talk that Steve Plimpton gave earliear this year at a workshop at Temple. This was live-streamed on YouTube and you can find the recording here: https://www.youtube.com/@templelammps/streams at this specific link: https://www.youtube.com/watch?v=fhNhZ6ilTOU
Thank you for your quick and detailed response. I will look into these. Although, I have a follow-up question that I need to ask.
I have a two-dimensional vector in the code that calculates a defined variable for each pair-atom (i,j) based on their position and physical properties. I declare this 2D vector and assign values to it based on global atom IDs (tag) in the code. Then, I use MPI_Allreduce to bring data from all processors to the global 2D vector and do further calculations for them.
Till now, the code worked perfectly for single-core processing, but now that I need to scale up and use parallel computing, as you said, it’s not the correct approach to use vectors with natoms size, and I am using two-dimensional vectors with the size of natoms × natoms which is way more inefficient.
So my question is, is there a way to have these types of two-dimensional vectors more efficiently in LAMMPS? I need pair-atom values for my calculations and need them in a way that allows me to determine the value between i and j at each time step.
Thank you in advance for your help and time.
Kind regards,
Hamed
Please re-read the last paragraph of my previous post. That has the link to the source of the most relevant information. More details on this are in the Programmer’s Guide part of the LAMMPS manual. I don’t know enough about your research and algorithm (it seems like a hard problem, or your approach is making it look like a hard problem) to give any specific advice and - besides - I don’t have the time and interest to learn it and solve it for you.