For this discussion, the mp-id can be thought of as an arbitrary identifier. As each distinct calculation is performed, it is assigned an mp-id sequentially.
What Materials Project does is use the crystal structure itself to group calculations together that refer to the same crystal. We do this using the
StructureMatcher class in pymatgen which can determine whether or not two crystal structures are equivalent subject to some tolerance.
In this way, the oldest/smallest mp-id for a given crystal structure becomes the canonical identifier we use in our database, with other calculations for that crystal structure grouped together with it.
There have been some attempts to create some kind of identifier that is deterministically assigned (e.g. based on space group, wyckoff position, or similar) but typically these identifiers tend to be very long and there’s usually some edge case not well handled (e.g. a general wyckoff position might require x, y and z co-ordinates to be defined).
The mp-id (being arbitrary) is not a perfect system, but it works decently well as a community standard since the MP database is open access and historical calculations remain available (vs, for example, the ICSD ID where multiple ICSD IDs might refer to the same crystal structure, and the ICSD is not itself open).
(Also, hi @blokhin! welcome to the forum)