How can I check if all values of a polars DataFrame, containing only boolean columns, are True?
Example df
:
df = pl.DataFrame({"a": [True, True, None],
"b": [True, True, True],
})
The reason for my question is that sometimes I want to check if all values of a df
fulfill a condition, like in the following:
df = pl.DataFrame({"a": [1, 2, None],
"b": [4, 5, 6],
}).select(pl.all() >= 1)
By the way, I didn't expect that .select(pl.all() >= 1)
keeps the null
(None) in last row of column "a", maybe that's worth noting.
How can I check if all values of a polars DataFrame, containing only boolean columns, are True?
Example df
:
df = pl.DataFrame({"a": [True, True, None],
"b": [True, True, True],
})
The reason for my question is that sometimes I want to check if all values of a df
fulfill a condition, like in the following:
df = pl.DataFrame({"a": [1, 2, None],
"b": [4, 5, 6],
}).select(pl.all() >= 1)
By the way, I didn't expect that .select(pl.all() >= 1)
keeps the null
(None) in last row of column "a", maybe that's worth noting.
As of the date of this edit, I found the following code to be the most idiomatic for polars (also in terms of performance):
df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False)
Note that this code can return True
, False
or None
. None
(or nothing) is returned when exclusively True
values and null
values exist.
Example with the df
from the question:
>>>df = pl.DataFrame({"a": [True, True, None],
... "b": [True, True, True],
... })
... df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False) # Nothing is returned because of the `None` in the df.
>>> df = pl.DataFrame({"a": [True, True, True],
... "b": [True, True, True],
... })
... df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False) # True is returned.
True
If no null
values exist in df
, one could omit ignore_nulls=False
.
df.mean_horizontal(ignore_nulls=False).eq_missing(1).all()
However, the advantage of this one is that it can only return True
or False
(no None
).
The second-best answer works because the mean of a row with only True values is always 1.
A more explicit approach could look as follows.
is_all_true = pl.all_horizontal(pl.all().all())
df.select(is_all_true).item()
True
Explanation. If df
is of shape (n, c), then:
pl.all().all()
will give a boolean dataframe of shape (1, c) indicating for each column whether it only contains true values;pl.all_horizontal(pl.all().all())
will give a boolean dataframe of shape (1, 1) indicating whether all values in df
are True
;.item()
is used to pick the literal value from the dataframe of shape (1, 1).Here, pl.Expr.fill_null
is used to explicitly set null values to False
before performing the logic above.
is_all_true = pl.all_horizontal(pl.all().fill_null(False).all())
df.select(is_all_true).item()
False
See this answer for more details in the context of checking for null values.