apache spark - PySpark foreach() is not implemented - Stack Overflow

admin2025-04-17  3

I have PySpark in my local environment:

pyspark[sql]==3.5.0

And SPARK_REMOTE pointing to Databricks:

SPARK_REMOTE=SPARK_REMOTE=sc://{dbx_workspace}:443/;x-databricks-cluster-id={dbx_cluster_id};token={pat}

If I call df.foreach(my_function) I get:

raise PySparkNotImplementedError(
pyspark.errors.exceptions.base.PySparkNotImplementedError: [NOT_IMPLEMENTED] foreach() is not implemented.

There are no problems with other transformations and actions, that I've seen.

A fix is to use databricks-connect but I would like to decouple from Databricks as much as possible.

转载请注明原文地址:http://anycun.com/QandA/1744873116a88826.html