The features I get include functions of ‘x0’, ‘x1’ etc. which were not in the original dataset (i.e. x0**3/x1**2); I am assuming these are substitutions of features or combinations of features but I’m not positive. I am reviewing the source to try and figure it out but some explanation would be greatly appreciated.
My thoughts are that the expressions I’m using as my column headers aren’t being converted from ‘x0’ etc. back to the original expression. Referring to the generate_expressions_combinations function in the FunctionFeaturizer class.
def generate_expressions_combinations(expressions, combo_depth=2, combo_function=np.prod):
"""
This function takes a list of strings representing functions
of x, converts them to sympy expressions, and combines
them according to the combo_depth parameter. Also filters
resultant expressions for any redundant ones determined
by sympy expression equivalence.
Args:
expressions (strings): all of the sympy-parseable strings
to be converted to expressions and combined, e. g.
["1 / x", "x ** 2"], must be functions of x
combo_depth (int): the number of independent variables to consider
combo_function (method): the function which combines the
the respective expressions provided, defaults to np.prod,
i. e. the cumulative product of the expressions
Returns:
list of unique non-trivial expressions for featurization
of inputs
"""
# Convert to array for simpler subsitution
exp_array = sp.Array([parse_expr(exp) for exp in expressions])
# Generate all of the combinations
combo_exps = []
all_arrays = [exp_array.subs({"x": "x{}".format(n)}) for n in range(combo_depth)]
# Get all sets of expressions
for exp_set in itertools.product(*all_arrays):
# Get all permutations of each set
for exp_perm in itertools.permutations(exp_set):
combo_exps.append(combo_function(exp_perm))
# Filter for unique combinations, also remove identity
unique_exps = list(set(combo_exps) - {parse_expr("x0")})
# Sort to keep ordering
unique_exps = sorted(unique_exps, key=lambda x: combo_exps.index(x))
return unique_exps unique_exps = sorted(unique_exps, key=lambda x: combo_exps.index(x))
return unique_exps```