FunctionFeaturizer (x0, x1, substitutions?)

Hello! I am using the FunctionFeaturizer to generate combinations of functionalized descriptors via the following method,

    function_featurizer = FunctionFeaturizer(multi_feature_depth=2,combo_function=np.prod)
    function_featurizer.set_n_jobs(n_cpus)
    function_featurizer=function_featurizer.fit(df_x[selected_feature_list])
    df_combined=function_featurizer.featurize_dataframe(df_x[selected_feature_list],selected_feature_list)

The features I get include functions of ‘x0’, ‘x1’ etc. which were not in the original dataset (i.e. x0**3/x1**2); I am assuming these are substitutions of features or combinations of features but I’m not positive. I am reviewing the source to try and figure it out but some explanation would be greatly appreciated.

Thanks!

My thoughts are that the expressions I’m using as my column headers aren’t being converted from ‘x0’ etc. back to the original expression. Referring to the generate_expressions_combinations function in the FunctionFeaturizer class.

def generate_expressions_combinations(expressions, combo_depth=2, combo_function=np.prod):
    """
    This function takes a list of strings representing functions
    of x, converts them to sympy expressions, and combines
    them according to the combo_depth parameter.  Also filters
    resultant expressions for any redundant ones determined
    by sympy expression equivalence.
    Args:
        expressions (strings): all of the sympy-parseable strings
            to be converted to expressions and combined, e. g.
            ["1 / x", "x ** 2"], must be functions of x
        combo_depth (int): the number of independent variables to consider
        combo_function (method): the function which combines the
            the respective expressions provided, defaults to np.prod,
            i. e. the cumulative product of the expressions
    Returns:
        list of unique non-trivial expressions for featurization
            of inputs
    """
    # Convert to array for simpler subsitution
    exp_array = sp.Array([parse_expr(exp) for exp in expressions])

    # Generate all of the combinations
    combo_exps = []
    all_arrays = [exp_array.subs({"x": "x{}".format(n)}) for n in range(combo_depth)]
    # Get all sets of expressions
    for exp_set in itertools.product(*all_arrays):
        # Get all permutations of each set
        for exp_perm in itertools.permutations(exp_set):
            combo_exps.append(combo_function(exp_perm))

    # Filter for unique combinations, also remove identity
    unique_exps = list(set(combo_exps) - {parse_expr("x0")})
    # Sort to keep ordering
    unique_exps = sorted(unique_exps, key=lambda x: combo_exps.index(x))
    return unique_exps    unique_exps = sorted(unique_exps, key=lambda x: combo_exps.index(x))
    return unique_exps```

Thanks for bringing this to our attention. This has been fixed as of FunctionFeaturizer: allow true feature names to be subsituted in for abstract sympy expressions by ardunn · Pull Request #725 · hackingmaterials/matminer · GitHub

Reply in this thread if you have any more problems.

Note you’ll need to use the latest commit on Github in order to get this update.