Firstly, I would like to thank the automatminer team for building such a useful library.
After running AutoFeaturizer(preset=‘heavy’), many features were generated based on the structure of my dataset. However, I could not find the descriptions of how these features were generated and what they represent. Is there any documentation out there that can help me make sense of the generated and selected features?
For info on each of the featurizers applied, see the matminer table of featurizers
For info on each of the featurizers applied, see the matminer source code, for example in the CoulombMatrix featurizer:
class CoulombMatrix(BaseFeaturizer):
"""
The Coulomb matrix, a representation of nuclear coulombic interaction.
Generate the Coulomb matrix, M, of the input structure (or molecule). The
Coulomb matrix was put forward by Rupp et al. (Phys. Rev. Lett. 108, 058301,
2012) and is defined by off-diagonal elements M_ij = Z_i*Z_j/|R_i-R_j| and
diagonal elements 0.5*Z_i^2.4, where Z_i and R_i denote the nuclear charge
and the position of atom i, respectively.
Coulomb Matrix features are flattened (for ML-readiness) by default. Use
fit before featurizing to use flattened features. To return the matrix form,
set flatten=False.
Args:
diag_elems (bool): flag indication whether (True, default) to use
the original definition of the diagonal elements; if set to False,
the diagonal elements are set to 0
flatten (bool): If True, returns a flattened vector based on eigenvalues
of the matrix form. Otherwise, returns a matrix object (single
feature), which will likely need to be processed further.
"""
For even more info, you can look at the citations for each of the featurizers which is applied, which will direct you to a peer reviewed publication giving as many details as you could want.
On a separate note, I would not recommend the heavy featurization preset unless you have already experimented with the express preset and found it inadequate for your purposes