Most efficient way of getting atoms IDs in Python?

I want to get the IDs of the atoms in a group, something like this works:

    def get_ids_in_group(self, lmp: lammps, group: str) -> np.ndarray:
        """Returns the IDs of the atoms in a group.

        Parameters
        ----------
        lmp : lammps
            The LAMMPS object.
        group : str
            The name of the group.

        Returns
        -------
        np.ndarray
            The IDs of the atoms in the group.
        """
        lmp.command(f"compute idlist {group} property/atom id")
        lmp.command("run 0 pre no post no")
        raw_ids = lmp.numpy.extract_compute("idlist", 1, 1).astype(int)
        lmp.command("uncompute idlist")
        group_ids = raw_ids[raw_ids != 0]
        return group_ids

My concern is efficiency, raw_ids is the same length as the number of atoms in the simulation (or in the corresponding processor), so it can be huge, leading to performance issues. Is there a better way of doing this?

The problem with your code is creating and delete a compute and having a run statement. That is both wasting CPU time and memory, too. You can just use

tags = lmp.numpy.extract_atom('id')
masks = lmp.numpy.extract_atom('mask')

and “tags” is a NumPy array with the atomIDs (aka tags) for all local atoms and “masks” is an arrays with the group bitmasks. This is using the same physical memory as C++ so there is no waste with copying and similar. You then need to loop over both lists at the same time to create the list of atomIDs for a group. If your groups don’t change, this needs to be done only once.

1 Like

Thanks for your useful and complete answer, I was using a compute because that is what I was using in a LAMMPS script (no Python), probably not efficient neither.

Here is the new code with your suggestions:

    def get_ids_in_group(self, lmp: lammps, group: str) -> np.ndarray:
        """Returns the IDs of the atoms in a group.

        Parameters
        ----------
        lmp : lammps
            The LAMMPS object.
        group : str
            The name of the group.

        Returns
        -------
        np.ndarray
            The global IDs of the atoms in the group.
        """
        ids = lmp.numpy.extract_atom("id")
        masks = lmp.numpy.extract_atom("mask")
        group_list = lmp.available_ids("group")
        try:
            idx = group_list.index(group)
        except ValueError as e:
            raise NameError(f"group '{group}' is not defined") from e
        group_bit = 1 << idx
        return ids[(masks & group_bit) != 0]

Hi again, just a little improvement to get any property for the atoms if someone needs it:

    def get_properties_from_group(
        self, lmp: lammps, group: str, properties: list[str]
    ) -> dict[str, np.ndarray]:
        """Returns the requested properties of a group of atoms.

        Parameters
        ----------
        lmp : lammps
            The LAMMPS object.
        group : str
            The name of the group.
        properties : list[str]
            The properties to extract.

        Returns
        -------
        dict[str, np.ndarray]
            The properties of the atoms in the group.
        """
        masks = lmp.numpy.extract_atom("mask")
        group_list = lmp.available_ids("group")
        try:
            idx = group_list.index(group)
        except ValueError as e:
            raise NameError(f"group '{group}' is not defined") from e
        group_bit = 1 << idx
        membership = (masks & group_bit) != 0
        result = {}
        for prop in properties:
            try:
                result[prop] = lmp.numpy.extract_atom(prop)[membership]
            except TypeError as e:  # returning None if the property is not defined
                raise NameError(f"Property '{prop}' is not defined") from e
        return result
1 Like