python - Creating reusable and composable filters for Pandas DataFrames - Stack Overflow

admin2025-04-25  2

I am working with multiple Pandas DataFrames with a similar structure and would like to create reusable filters that I can define once and then apply or combine as needed.

The only working solution I came up with so far feels clunky to me and makes it hard to combine filters with OR:

import pandas as pd
df = pd.DataFrame({"A":[1,1,2],"B":[1,2,3]})

def filter_A(df):
    return df.loc[df["A"]==1]

def filter_B(df):
    return df.loc[df["B"]==2]

print(filter_A(filter_B(df)).head())

I am hoping for something along the lines of

filter_A = (df["A"]==1)
filter_B = (df["B"]==2)

print(df.loc[(filter_A) & (filter_B)])

but reusable after changing the df and also applicable to other DataFrames with the same columns. Is there any cleaner or more readable way to do this?

I am working with multiple Pandas DataFrames with a similar structure and would like to create reusable filters that I can define once and then apply or combine as needed.

The only working solution I came up with so far feels clunky to me and makes it hard to combine filters with OR:

import pandas as pd
df = pd.DataFrame({"A":[1,1,2],"B":[1,2,3]})

def filter_A(df):
    return df.loc[df["A"]==1]

def filter_B(df):
    return df.loc[df["B"]==2]

print(filter_A(filter_B(df)).head())

I am hoping for something along the lines of

filter_A = (df["A"]==1)
filter_B = (df["B"]==2)

print(df.loc[(filter_A) & (filter_B)])

but reusable after changing the df and also applicable to other DataFrames with the same columns. Is there any cleaner or more readable way to do this?

Share Improve this question asked Jan 16 at 11:59 NN314NN314 1351 silver badge4 bronze badges 1
  • 1 Note that you're using AND here, not OR. – mozway Commented Jan 16 at 12:06
Add a comment  | 

3 Answers 3

Reset to default 2

You can use the .eval() method, which allows for the evaluation of a string describing operations on dataframe columns:

  1. Evaluate these string expressions on the dataframe df.

  2. Combine the results of these evaluations using the bitwise AND operator (&), which performs element-wise logical AND operation.

  3. Use the .loc accessor to filter the dataframe based on the combined condition.

filter_A = 'A == 1'
filter_B = 'B == 2'
df.loc[df.eval(filter_A) & df.eval(filter_B)]

Output:

   A  B
1  1  2

query could be a good solution:

filter_A = 'A == 1'
filter_B = 'B == 2'

print(df.query(f'({filter_A}) & ({filter_B})'))

Alternatively:

filters = [filter_A, filter_B]
df.query('&'.join(f'({f})' for f in filters))

# or
df.query('&'.join(map('({})'.format, filters)))

Output:

   A  B
1  1  2

If you're only ever combining based on logical AND, you can use .pipe() with your existing functions, or .loc with callables (i.e. functions) that produce boolean masks.

Otherwise, you can use mask functions and combine them in a function you pass to .loc.

Note: Here I've changed the name of the dataframe to df_example to avoid confusion with the function parameter df.

.pipe()

>>> df_example.pipe(filter_A).pipe(filter_B)
   A  B
1  1  2

.loc with mask functions

def mask_A(df):
    return df["A"] == 1

def mask_B(df):
    return df["B"] == 2
>>> df_example.loc[mask_A].loc[mask_B]
   A  B
1  1  2

.loc with combined function

>>> df_example.loc[lambda df: mask_A(df) & mask_B(df)]
   A  B
1  1  2
>>> # now OR
>>> df_example.loc[lambda df: mask_A(df) | mask_B(df)]
   A  B
0  1  1
1  1  2
转载请注明原文地址:http://anycun.com/QandA/1745533389a90867.html