how do I find all polars dataframes in python - Stack Overflow

admin2025-05-02  1

I have a long script in python, predominantly pandas, but shifting to polars.

I am reviewing memory of items.

To find 10 largest objects currently in use locals().items() and sys.getsizeof(), I run:

import sys
def sizeof_fmt(num, suffix='B'):
    ''' by Fred Cirera,  , modified'''
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f %s%s" % (num, 'Yi', suffix)

for name, size in sorted(((name, sys.getsizeof(value)) for name, value in list(
                          locals().items())), key= lambda x: -x[1])[:10]:
    print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))

but, I know I have some polars objects, when I run:

pl.DataFrame.estimated_size(data)

I get a value of 400MB, which would make it the largest in my current script.

the polars dataframes are not returned in my locals() call

Is there a way to determine all polars objects currently in use?

I have a long script in python, predominantly pandas, but shifting to polars.

I am reviewing memory of items.

To find 10 largest objects currently in use locals().items() and sys.getsizeof(), I run:

import sys
def sizeof_fmt(num, suffix='B'):
    ''' by Fred Cirera,  https://stackoverflow.com/a/1094933/1870254, modified'''
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f %s%s" % (num, 'Yi', suffix)

for name, size in sorted(((name, sys.getsizeof(value)) for name, value in list(
                          locals().items())), key= lambda x: -x[1])[:10]:
    print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))

but, I know I have some polars objects, when I run:

pl.DataFrame.estimated_size(data)

I get a value of 400MB, which would make it the largest in my current script.

the polars dataframes are not returned in my locals() call

Is there a way to determine all polars objects currently in use?

Share Improve this question edited Jan 2 at 13:59 jqurious 22.5k5 gold badges20 silver badges39 bronze badges asked Jan 2 at 13:57 frankfrank 3,6569 gold badges42 silver badges79 bronze badges 1
  • What is data when you run pl.DataFrame.estimated_size(data)? – Hericks Commented Jan 3 at 10:48
Add a comment  | 

1 Answer 1

Reset to default 0
import sys
import polars as pl

def sizeof_fmt(num, suffix='B'):
    ''' by Fred Cirera,  https://stackoverflow.com/a/1094933/1870254, modified'''
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f %s%s" % (num, 'Yi', suffix)

def find_polars_dataframes():
    polars_dfs = []
    for name, value in list(locals().items()):
        if isinstance(value, pl.DataFrame) and hasattr(value, 'estimated_size'):
            size = value.estimated_size()
            polars_dfs.append((name, size))
    return polars_dfs

polars_dataframes = find_polars_dataframes()

for name, size in sorted(polars_dataframes + [(name, sys.getsizeof(value)) for name, value in list(locals().items()) if not hasattr(value, 'estimated_size')], key=lambda x: -x[1])[:10]:
    print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))

output:

            sizeof_fmt:  152.0 B
find_polars_dataframes:  152.0 B
              __file__:   89.0 B
          __builtins__:   72.0 B
                   sys:   72.0 B
                    pl:   72.0 B
       __annotations__:   64.0 B
              __name__:   57.0 B
            __loader__:   56.0 B
     polars_dataframes:   56.0 B

(though I suspect I'm missing something...)

转载请注明原文地址:http://anycun.com/QandA/1746116265a91895.html