The normalization of RDF in matminer/featurizers/structure/rdf.py seems to be somewhat confusing for me.
(1) The peak values of RDF are different in multiples for different supercells of the same crystal (e.g. Silicon),
using the same cutoff.
This issue can be solved by the following replacement:
rdf → rdf/s.num_sites
(2) The peak values of RDF are different in multiples for different bin sizes.
This issue can be solved by the following replacement:
rdf → rdf * self.bin_size
Actually I am not familiar with the normalization of RDF, so I ask for the verification of the above issues. Thanks!
In my experience when we use the expression:
rdf = dist_hist / s.num_sites
we can get the “raw” RDF, in which the first peak value of Silicon crystal is exactly 4 and the second is 12,
no matter what the bin size is. For the normalized RDF, i.e., the “density” of the distribution at a certain radius,
I wonder why the “density” changes with respect to the cell size and the bin size.