I have a long script in python, predominantly pandas, but shifting to polars.
I am reviewing memory of items.
To find 10 largest objects currently in use locals().items()
and sys.getsizeof()
, I run:
import sys
def sizeof_fmt(num, suffix='B'):
''' by Fred Cirera, , modified'''
for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
if abs(num) < 1024.0:
return "%3.1f %s%s" % (num, unit, suffix)
num /= 1024.0
return "%.1f %s%s" % (num, 'Yi', suffix)
for name, size in sorted(((name, sys.getsizeof(value)) for name, value in list(
locals().items())), key= lambda x: -x[1])[:10]:
print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))
but, I know I have some polars objects, when I run:
pl.DataFrame.estimated_size(data)
I get a value of 400MB, which would make it the largest in my current script.
the polars dataframes are not returned in my locals() call
Is there a way to determine all polars objects currently in use?
I have a long script in python, predominantly pandas, but shifting to polars.
I am reviewing memory of items.
To find 10 largest objects currently in use locals().items()
and sys.getsizeof()
, I run:
import sys
def sizeof_fmt(num, suffix='B'):
''' by Fred Cirera, https://stackoverflow.com/a/1094933/1870254, modified'''
for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
if abs(num) < 1024.0:
return "%3.1f %s%s" % (num, unit, suffix)
num /= 1024.0
return "%.1f %s%s" % (num, 'Yi', suffix)
for name, size in sorted(((name, sys.getsizeof(value)) for name, value in list(
locals().items())), key= lambda x: -x[1])[:10]:
print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))
but, I know I have some polars objects, when I run:
pl.DataFrame.estimated_size(data)
I get a value of 400MB, which would make it the largest in my current script.
the polars dataframes are not returned in my locals() call
Is there a way to determine all polars objects currently in use?
import sys
import polars as pl
def sizeof_fmt(num, suffix='B'):
''' by Fred Cirera, https://stackoverflow.com/a/1094933/1870254, modified'''
for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
if abs(num) < 1024.0:
return "%3.1f %s%s" % (num, unit, suffix)
num /= 1024.0
return "%.1f %s%s" % (num, 'Yi', suffix)
def find_polars_dataframes():
polars_dfs = []
for name, value in list(locals().items()):
if isinstance(value, pl.DataFrame) and hasattr(value, 'estimated_size'):
size = value.estimated_size()
polars_dfs.append((name, size))
return polars_dfs
polars_dataframes = find_polars_dataframes()
for name, size in sorted(polars_dataframes + [(name, sys.getsizeof(value)) for name, value in list(locals().items()) if not hasattr(value, 'estimated_size')], key=lambda x: -x[1])[:10]:
print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))
output:
sizeof_fmt: 152.0 B
find_polars_dataframes: 152.0 B
__file__: 89.0 B
__builtins__: 72.0 B
sys: 72.0 B
pl: 72.0 B
__annotations__: 64.0 B
__name__: 57.0 B
__loader__: 56.0 B
polars_dataframes: 56.0 B
(though I suspect I'm missing something...)
data
when you runpl.DataFrame.estimated_size(data)
? – Hericks Commented Jan 3 at 10:48