Incoherent pair results using neighbor lists

opittet · May 9, 2024, 9:55pm

Dear all,

I have been trying to compute the Short Range Order (SRO) parameter for a equibinary Mo-Nb system using the neighbor lists provided by LAMMPS. The results outputted are incoherent: the number of NbMo pairs should be strictly equal to the number of MoNb pairs, as we are looping over every atom in the system.

Here is the 2000 atom BCC system, equibinary Mo Nb
NbMo_binary.dat (152.6 KB)

initial_datafile= NbMo_binary.dat
pair_count_dic_list_n1=[]
pair_count_dic_list_n2=[]
NVT_steps_between_SRO= 50 #between each sampling 

in_file=f"""
clear
units           metal
dimension       3
atom_style      atomic
read_data       {initial_datafile}
pair_style      adp 
pair_coeff      * * W_Mo_Nb.adp.alloy Nb Mo 
neigh_modify    every {NVT_steps_between_SRO} check no 
fix             1 all nvt temp 700 700 $(dt*100) 

"""
lmp.commands_string(in_file)



lmp.command(f'run {NVT_steps_between_SRO} pre no post no')

## build lists of dictionnaries containins the pair counters ## 
pair_count_dic_n1=get_pair_count(initial_datafile,lmp,'adp')
pair_count_dic_list_n1.append(pair_count_dic_n1)
pair_count_dic_n2=get_pair_count(initial_datafile,lmp,'adp',n_shell=2)
pair_count_dic_list_n2.append(pair_count_dic_n2)

Using the function get_pair_count:

def get_pair_count(datafile,lmp, pair_style,n_shell=1):
   ## get atom type in string format to have all the possible pairs##   
    mass_list=extract_masses(datafile)
    all_atom_types= [mass_to_symbol(mass) for mass in mass_list]
    all_pair_types=[]
    for i,atom_type in enumerate(all_atom_types):
        for j,neighbor_type in enumerate(all_atom_types):
            #if atom_type!= neighbor_type:
                pair = atom_type+neighbor_type
                all_pair_types.append(pair)
    all_pair_types_dic={}
    for i in range(len(all_pair_types)):
        all_pair_types_dic[all_pair_types[i]] =0 


    idx = lmp.find_pair_neighlist(pair_style)
    get_atom_type = lmp.extract_atom('type')

        # example using NumPy arrays
    nplist = lmp.numpy.get_neighlist(idx)
    nlocal = lmp.extract_global("nlocal", LAMMPS_INT)
    nghost = lmp.extract_global("nghost", LAMMPS_INT)
    ntag = lmp.numpy.extract_atom_iarray("id", nlocal+nghost, dim=1)

## get the atom types of the center atom and its neighbors, then count all pair occurences ##
    for i in range(0,nplist.size):
        atom_i, neighbors_i = nplist[i]
        center_atom_id= ntag[atom_i]
        center_atom_type=all_atom_types[get_atom_type[center_atom_id]-1]
        if nplist.size > 0:
            if n_shell==1:
                for n in np.nditer(neighbors_i[:8]):
                    neighbor_global_id= ntag[n]
                    neighbor_type=all_atom_types[get_atom_type[neighbor_global_id]-1]
                    print(center_atom_type+neighbor_type)
                    all_pair_types_dic[center_atom_type+neighbor_type]+= 1 
            if n_shell==2: 
                for n in np.nditer(neighbors_i[8:14]):
                    neighbor_global_id= ntag[n]
                    neighbor_type=all_atom_types[get_atom_type[neighbor_global_id]-1]
                    print(center_atom_type+neighbor_type)
                    all_pair_types_dic[center_atom_type+neighbor_type]+= 1 
    return all_pair_types_dic

For example one iteration over the system in NVT yields this pair list counter:
{'NbNb': 3929, 'NbMo': 4063, 'MoNb': 4039, 'MoMo': 3969}

This result does not make sense as ‘NbMo’ is not strictly equal to ‘MoNb’.

I could come up with 2 hypotheses that could explain this:

Half-list: we might be missing terms due to how the lists are constructed, but I would then assume we would not always get a total number of pairs equal to 16000, which is 8*2000 (the first coordination number for a BCC cell times the number of atoms, which seems to be the case in every attempt.)
Behavior of the neighbor lists: I copied the code from a post from akohlmey: Using neighbor list from LAMMPS in Python , but i could not find more documentation on the retrieval of the global ID from the “summation” of the ghost and local IDs.

I hope I have been able to follow the forum guidelines and I thank you in advance for your time and help.

Best,
Olivier

srtee · May 9, 2024, 11:50pm

I think your results are sensible and expectable. I think in your simulation LAMMPS uses a half-neighbour list where each neighbour pair interaction is only included once, not twice, in the neighbour list, since the default newton on behaviour uses Newton’s Third Law to apply a calculated force simultaneously to both particles in a neighbour interaction.

So consider a simulation with only two particles of types 1 and 2 respectively. (Always useful to consider a minimal example.) I would expect the neighbour list to contain only one pair. It will be either a 12 pair or a 21 pair, depending on which atom is sorted first by local index, but not both, and therefore the number of 12 and 21 pairs would in general not be equal.

akohlmey · May 10, 2024, 2:50am

Yes, you have. Thank you for that. May many more follow your example!

Your hypothesis 1 is half-correct. You can confirm the type of neighbor list requested from the output. Here is the equivalent output for a simple test input with the Ni.adp potential from the LAMMPS distribution.

Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 7.168
  ghost atom cutoff = 7.168
  binsize = 3.584, bins = 4 4 4
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair adp, perpetual
      attributes: half, newton on
      pair build: half/bin/atomonly/newton
      stencil: half/bin/3d
      bin: standard

However, the number of pairs cannot be easily inferred from the lattice, since the neighbor list code knows nothing about it. Instead it uses the cutoff embedded in the potential file (line 5, fifth column, 5.168 \AA here) plus the neighbor list skin (2.0 by default) and thus constructs a list of neighbors up to 7.168 \AA distance.

The neighbor list entries are using the “local” indexing that applies to all per-atom properties. The global atom ID is in the “ntag” array.

This is incorrect. You must use the local index (i.e. atom_i) and not the global atom ID ntag[atom_i] for accessing any per-atom properties.

Same issue here.

This may give expected results by chance for a serial calculation, but as soon as you run in parallel, you will get crashes due to segmentation faults since the global index will exceed the size of the local per-atom array due to using domain decomposition.

opittet · May 17, 2024, 11:09am

Thank you two for your help, in order to further debug my pair builder to perform the SROs (two of our major concern is that the list of neighbors is not properly ordered by increasing distance, or that we do not extract the values from the right atoms), I attempted to recreate the RDF function from LAMMPS using the lmp.extract_atom() command.

I ran into 3 more issues:

The numpy unwrapper method for the lammps object seems to be bugged.
The RDF output was wrong.
I was not able to output of the correct distances for a perfect FCC crystal, using periodic boundary conditions.

1. Numpy unwrapper methods

For this I followed the example from 2.3.10. Neighbor list access — LAMMPS documentation, adding the position using the local ID as suggested:

The numpy unwrapper seems to be bugged for anything that is not the global ID, as the lmp.numpy.extract._atom() tries to access the index of the local ID value on an array of the same size as the number of atoms in our system.
The issue can be reproduced in this file:
numpy_unwrapper_bugged.py (1.5 KB)

from lammps import lammps
import numpy as np 
from lammps import lammps
lmp = lammps()

lmp.commands_string("""
region box block -2 2 -2 2 -2 2
lattice fcc 1.0
create_box 1 box
create_atoms 1 box
mass 1 1.0
pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0
run 0 post no
""")


#working with the C++ version with an unwrapper  

nlidx = lmp.find_pair_neighlist('lj/cut')
nl = lmp.numpy.get_neighlist(nlidx)
tags = lmp.extract_atom('id')
coords= lmp.extract_atom('x')
print("half neighbor list with {} entries".format(nl.size))
# print neighbor list contents
for i in range(0,nl.size):
    idx, neighbors_i  = nl.get(i)
    print("\natom {} with ID {} and position {} has {} neighbors:".format(idx,tags[idx],np.ctypeslib.as_array(coords[idx],shape=[3]),neighbors_i.size))
    if neighbors_i.size > 0:
        for n in np.nditer(neighbors_i):
            print("  atom {} with ID {}, position{}".format(n,tags[n],np.ctypeslib.as_array(coords[n],shape=[3])))



# bugged numpy with local ID 

nlidx = lmp.find_pair_neighlist('lj/cut')
nl = lmp.numpy.get_neighlist(nlidx)
tags = lmp.extract_atom('id')
coords= lmp.numpy.extract_atom('x')
print("half neighbor list with {} entries".format(nl.size))
# print neighbor list contents
for i in range(0,nl.size):
    idx, neighbors_i  = nl.get(i)
    print("\natom {} with ID {} and position {} has {} neighbors:".format(idx,tags[idx],coords[idx],neighbors_i.size))
    if neighbors_i.size > 0:
        for n in np.nditer(neighbors_i):
            print("  atom {} with ID {}, position{}".format(n,tags[n],coords[n]))

2. RDF method incorrect output

To obtain the RDF, I changed the lammps object following the small example from the documentation https://docs.lammps.org/compute_rdf.html:

lmp.commands_string("""
region box block -2 2 -2 2 -2 2
lattice fcc 1.0
create_box 1 box
create_atoms 1 box
mass 1 1.0
pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0
compute myRDF all rdf 50
fix 1 all ave/time 100 1 100 c_myRDF[*] file tmp.rdf mode vector
run 0 post no
write_data debug_data.dat       
""")

and obtain the following plot.

I realised this is not the expected output, as we should obtain peaks like this (at least for the first few neighbors), why is that so?

3. Getting the right distances for an FCC crystal

Anyhow, I wanted to reproduce this second picture using the lmp.extract._atom(‘x’) command. In order to do so, i have used the following code:

def distance_pbc(box, x1, x2):
## method to reproduce the periodic boundary conditions## 
    dx = np.zeros_like(box)  # create empty 3,1 vector 
    print(x1,x2)
    for m in range(3):  # iterate over x, y, z dimension
        dx[m]= x1[m] - x2[m]  # Take actual distance between the particles in the primary box
        while (dx[m] > box[m]/2):  # If distance is larger than half box width there is a closer particle in another box
            print(dx[m],f'dx[{m}] was too big')
            dx[m] -= box[m]
            print(dx[m],'new dx[m] ')
        while (dx[m] <= -box[m]/2): # If distance is smaller than negative half box width there is a closer particle in another box
            dx[m] += box[m]
    return dx  # return 3D distance

def get_distance_histogram(lmp, pair_style):
## method to get the distances between a central atom and all of their neighbors ##
    box=[4.,4.,4.] #define box size for PBC
    idx = lmp.find_pair_neighlist(pair_style)
    # example using NumPy arrays
    nplist = lmp.numpy.get_neighlist(idx)
    coords = lmp.extract_atom('x')
    distance_to_center_list_list=[]

    # print neighbor list contents
    for i in range(nplist.size):
        atom_i, neighbors_i  = nplist[i]
        print(neighbors_i)
        distance_to_center_list= []
        
        center_atom_pos= np.ctypeslib.as_array(coords[atom_i],shape=[3]) 
        print(center_atom_pos)
        if neighbors_i.size > 0:
            
            for neighbor_id in neighbors_i:
                print(neighbor_id)
                neighbor_pos=  np.ctypeslib.as_array(coords[neighbor_id],shape=[3]) 
                distance_to_center=np.linalg.norm(center_atom_pos-neighbor_pos)
                dx=distance_pbc(box=box,x1=center_atom_pos,x2=neighbor_pos)
                print('non-pbc dist=',center_atom_pos-neighbor_pos,'pbc dis=',dx, np.linalg.norm(dx))
                distance_to_center=np.linalg.norm(dx)
                distance_to_center_list.append(distance_to_center)
            distance_to_center_list_list.append(distance_to_center_list)
          
    df=pd.DataFrame(distance_to_center_list_list)

    return df

and finally:

df= get_distance_histogram(lmp=lmp,pair_style='lj/cut')
display(df)
mean_dists=[]
for series_name, series in df.items():
    print(series_name)
    mean_dist= np.mean(series)
    mean_dists.append(mean_dist)

num_bins = 50  # Adjust as needed

plt.hist(mean_dists, bins=num_bins, edgecolor='black')

plt.xlabel('distance [A]')
plt.ylabel('frequence')
plt.title('pseudo g(r)')

plt.show()

Outputting the distances of the neighbors, this is not coherent as we should firstly have an ordered list and then have 6 first neighbors at the same distance and then 12,24 etc.
It is also not coherent with the wrong RDF output from above.

and the average for each n-th neighbor:

Thank you all for your help again,

Best,
Olivier

akohlmey · May 18, 2024, 2:20am

No. This is documented behavior. By default the returned NumPy array is limited to nlocal (since that is always valid for per-atom data) and 1d data. You need to use the following to access ghost atom positions:

nlocal = lmp.extract_setting('nlocal')
nghost = lmp.extract_setting('nghost')
coords = lmp.numpy.extract_atom('x',nelem=nlocal+nghost,dim=3)

Your quoted example with ctypes access is also not 100%, it should have:

    print("\natom {} with ID {} and position {} has {} neighbors:".format(idx,tags[idx],coords[idx][0:3],neighbors_i.size))

I suspect you are disregarding that the “lattice” parameter in reduced units has a different meaning and thus your system isn’t quite what you are expecting. If you swap around the “lattice” and the “region” command you are getting a system that is a clean periodic continuation of the FCC lattice for a system with 256 atoms (your input produces only 63 atoms). In reduced units lattice fcc 1.0 creates a lattice spacing of 1.5874011.

As for the number of neighbors. You cannot directly plot this from the RDF data. If you look at columns 2 and 4 you get the first set of neighbors (using the 256 atom system) of 12. Since this column is the integral the following number are incremental. The next value is 18 which corresponds to the expected 6 nnn entries (18-12). The next is then 42 and thus gets the expected 24 = 42-18.

So everything seems to be in order.