How to call all the functions in a Python file on an object

NOTE: This is a migration of an old post from my previous blog.

Recently, I’ve been playing around with some competitions on Kaggle. Given that an inescapable fact of Machine Learning is Feature Selection, I’ve been finding myself in the situation of having to call a dozen or more functions that add synthetic features, infer missing values, etc., on the same Pandas DataFrame.

The following code snippet will call every function in its .py file but itself on the object, using tail recursion (nested helper function recCall):

import inspect
import sys
 
import pandas as pd
 
def addAllFeatures(data: pd.DataFrame):
    currFunc  = inspect.getframeinfo(inspect.currentframe()).function
    functions = [obj for name, obj in
                 inspect.getmembers(sys.modules[__name__])
                 if (inspect.isfunction(obj) and name != currFunc)]
 
    def recCall(modifiedData, remFuncs):
        if len(remFuncs) == 0:
            return modifiedData
         
        return recCall(remFuncs[0](modifiedData), remFuncs[1:])
 
    return recCall(data, functions)

For this specific example, the general form of the feature-adding functions is:

def addFeature(data: pd.DataFrame, *args, **kwargs):
     
    # add the feature ...
     
    return data

Also notable is the fact that the data gets modified between calls of the recursion; if your features depend on each other, then the functions list would need to be in the correct dependency order. Determining this can, however, quickly become non-trivial.

Andrej Leban
Andrej Leban
Ph.D. Student