Documentation on AutoFeaturizer Automatminer

Chryston_Boo · August 25, 2021, 9:51am

Firstly, I would like to thank the automatminer team for building such a useful library.

After running AutoFeaturizer(preset=‘heavy’), many features were generated based on the structure of my dataset. However, I could not find the descriptions of how these features were generated and what they represent. Is there any documentation out there that can help me make sense of the generated and selected features?

Thanks!

ardunn · August 31, 2021, 8:28pm

Hey @Chryston_Boo!

Getting comprehensive info about the generated features can be done on a few different levels:

You can inspect the featurizer sets from this source file: automatminer/sets.py at c68ea8d966b3163cc44ba0e811951de97bbf1a23 · hackingmaterials/automatminer · GitHub

For info on each of the featurizers applied, see the matminer table of featurizers

For info on each of the featurizers applied, see the matminer source code, for example in the CoulombMatrix featurizer:

class CoulombMatrix(BaseFeaturizer):
    """
    The Coulomb matrix, a representation of nuclear coulombic interaction.
    Generate the Coulomb matrix, M, of the input structure (or molecule). The
    Coulomb matrix was put forward by Rupp et al. (Phys. Rev. Lett. 108, 058301,
    2012) and is defined by off-diagonal elements M_ij = Z_i*Z_j/|R_i-R_j| and
    diagonal elements 0.5*Z_i^2.4, where Z_i and R_i denote the nuclear charge
    and the position of atom i, respectively.
    Coulomb Matrix features are flattened (for ML-readiness) by default. Use
    fit before featurizing to use flattened features. To return the matrix form,
    set flatten=False.
    Args:
        diag_elems (bool): flag indication whether (True, default) to use
            the original definition of the diagonal elements; if set to False,
            the diagonal elements are set to 0
        flatten (bool): If True, returns a flattened vector based on eigenvalues
            of the matrix form. Otherwise, returns a matrix object (single
            feature), which will likely need to be processed further.
    """

For even more info, you can look at the citations for each of the featurizers which is applied, which will direct you to a peer reviewed publication giving as many details as you could want.

ardunn · August 31, 2021, 8:29pm

On a separate note, I would not recommend the heavy featurization preset unless you have already experimented with the express preset and found it inadequate for your purposes