I am working with multiple Pandas DataFrames with a similar structure and would like to create reusable filters that I can define once and then apply or combine as needed.
The only working solution I came up with so far feels clunky to me and makes it hard to combine filters with OR:
import pandas as pd
df = pd.DataFrame({"A":[1,1,2],"B":[1,2,3]})
def filter_A(df):
return df.loc[df["A"]==1]
def filter_B(df):
return df.loc[df["B"]==2]
print(filter_A(filter_B(df)).head())
I am hoping for something along the lines of
filter_A = (df["A"]==1)
filter_B = (df["B"]==2)
print(df.loc[(filter_A) & (filter_B)])
but reusable after changing the df and also applicable to other DataFrames with the same columns. Is there any cleaner or more readable way to do this?
I am working with multiple Pandas DataFrames with a similar structure and would like to create reusable filters that I can define once and then apply or combine as needed.
The only working solution I came up with so far feels clunky to me and makes it hard to combine filters with OR:
import pandas as pd
df = pd.DataFrame({"A":[1,1,2],"B":[1,2,3]})
def filter_A(df):
return df.loc[df["A"]==1]
def filter_B(df):
return df.loc[df["B"]==2]
print(filter_A(filter_B(df)).head())
I am hoping for something along the lines of
filter_A = (df["A"]==1)
filter_B = (df["B"]==2)
print(df.loc[(filter_A) & (filter_B)])
but reusable after changing the df and also applicable to other DataFrames with the same columns. Is there any cleaner or more readable way to do this?
You can use the .eval()
method, which allows for the evaluation of a string describing operations on dataframe columns:
Evaluate these string expressions on the dataframe df
.
Combine the results of these evaluations using the bitwise AND operator (&
), which performs element-wise logical AND operation.
Use the .loc
accessor to filter the dataframe based on the combined condition.
filter_A = 'A == 1'
filter_B = 'B == 2'
df.loc[df.eval(filter_A) & df.eval(filter_B)]
Output:
A B
1 1 2
query
could be a good solution:
filter_A = 'A == 1'
filter_B = 'B == 2'
print(df.query(f'({filter_A}) & ({filter_B})'))
Alternatively:
filters = [filter_A, filter_B]
df.query('&'.join(f'({f})' for f in filters))
# or
df.query('&'.join(map('({})'.format, filters)))
Output:
A B
1 1 2
If you're only ever combining based on logical AND, you can use .pipe()
with your existing functions, or .loc
with callables (i.e. functions) that produce boolean masks.
Otherwise, you can use mask functions and combine them in a function you pass to .loc
.
Note: Here I've changed the name of the dataframe to df_example
to avoid confusion with the function parameter df
.
.pipe()
>>> df_example.pipe(filter_A).pipe(filter_B)
A B
1 1 2
.loc
with mask functionsdef mask_A(df):
return df["A"] == 1
def mask_B(df):
return df["B"] == 2
>>> df_example.loc[mask_A].loc[mask_B]
A B
1 1 2
.loc
with combined function>>> df_example.loc[lambda df: mask_A(df) & mask_B(df)]
A B
1 1 2
>>> # now OR
>>> df_example.loc[lambda df: mask_A(df) | mask_B(df)]
A B
0 1 1
1 1 2