python - How can I work with larger than memory Snowflake datasets in polars? - Stack Overflow

admin2025-04-17  3

I'm trying to work with data stored in a Snowflake database using polars in python. I see I can access the data with pl.read_database_uri with the adbc engine. I was wondering how I can do this efficiently for larger-than-memory datasets.

  • Is it possible to stream the results using polar's lazy API, or any other method?
  • Is it possible to batch the results as pl.read_database can? Or is it possible to partition the results, as the docs say is possible with connectorx?
  • Are there any other ways I might use polars to help work with larger-than-memory datasets in this instance? Or do I need to do my processing in SQL so that the data comes into python in a manageable size?

Thanks!

I'm trying to work with data stored in a Snowflake database using polars in python. I see I can access the data with pl.read_database_uri with the adbc engine. I was wondering how I can do this efficiently for larger-than-memory datasets.

  • Is it possible to stream the results using polar's lazy API, or any other method?
  • Is it possible to batch the results as pl.read_database can? Or is it possible to partition the results, as the docs say is possible with connectorx?
  • Are there any other ways I might use polars to help work with larger-than-memory datasets in this instance? Or do I need to do my processing in SQL so that the data comes into python in a manageable size?

Thanks!

Share Improve this question asked Jan 31 at 19:25 user2966505user2966505 671 silver badge5 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

As of polars==1.25.2 there's not an easy way to do this.

One way I've approached this problem is to use the Snowflake Connector for Python to iteratively retrieve batches of of a query result and process those batches using polars.

But I encountered some surprising Snowflake Connector behavior when doing this:

  • Queries which produce zero rows return None rather than an empty table. This is possible to work around with fetch_arrow_all(..., force_return_table=True) (docs).

  • When using fetch_arrow_batches() , column datatypes can vary among batches

转载请注明原文地址:http://anycun.com/QandA/1744849673a88485.html