FunctionFeaturizer (x0, x1, substitutions?)

dptru10 · November 17, 2021, 9:05pm

Hello! I am using the FunctionFeaturizer to generate combinations of functionalized descriptors via the following method,

    function_featurizer = FunctionFeaturizer(multi_feature_depth=2,combo_function=np.prod)
    function_featurizer.set_n_jobs(n_cpus)
    function_featurizer=function_featurizer.fit(df_x[selected_feature_list])
    df_combined=function_featurizer.featurize_dataframe(df_x[selected_feature_list],selected_feature_list)

The features I get include functions of ‘x0’, ‘x1’ etc. which were not in the original dataset (i.e. x0**3/x1**2); I am assuming these are substitutions of features or combinations of features but I’m not positive. I am reviewing the source to try and figure it out but some explanation would be greatly appreciated.

Thanks!

dptru10 · November 18, 2021, 4:36pm

My thoughts are that the expressions I’m using as my column headers aren’t being converted from ‘x0’ etc. back to the original expression. Referring to the generate_expressions_combinations function in the FunctionFeaturizer class.

def generate_expressions_combinations(expressions, combo_depth=2, combo_function=np.prod):
    """
    This function takes a list of strings representing functions
    of x, converts them to sympy expressions, and combines
    them according to the combo_depth parameter.  Also filters
    resultant expressions for any redundant ones determined
    by sympy expression equivalence.
    Args:
        expressions (strings): all of the sympy-parseable strings
            to be converted to expressions and combined, e. g.
            ["1 / x", "x ** 2"], must be functions of x
        combo_depth (int): the number of independent variables to consider
        combo_function (method): the function which combines the
            the respective expressions provided, defaults to np.prod,
            i. e. the cumulative product of the expressions
    Returns:
        list of unique non-trivial expressions for featurization
            of inputs
    """
    # Convert to array for simpler subsitution
    exp_array = sp.Array([parse_expr(exp) for exp in expressions])

    # Generate all of the combinations
    combo_exps = []
    all_arrays = [exp_array.subs({"x": "x{}".format(n)}) for n in range(combo_depth)]
    # Get all sets of expressions
    for exp_set in itertools.product(*all_arrays):
        # Get all permutations of each set
        for exp_perm in itertools.permutations(exp_set):
            combo_exps.append(combo_function(exp_perm))

    # Filter for unique combinations, also remove identity
    unique_exps = list(set(combo_exps) - {parse_expr("x0")})
    # Sort to keep ordering
    unique_exps = sorted(unique_exps, key=lambda x: combo_exps.index(x))
    return unique_exps    unique_exps = sorted(unique_exps, key=lambda x: combo_exps.index(x))
    return unique_exps```

ardunn · November 20, 2021, 4:14am

Thanks for bringing this to our attention. This has been fixed as of FunctionFeaturizer: allow true feature names to be subsituted in for abstract sympy expressions by ardunn · Pull Request #725 · hackingmaterials/matminer · GitHub

Reply in this thread if you have any more problems.

ardunn · November 20, 2021, 4:15am

Note you’ll need to use the latest commit on Github in order to get this update.