sql - Error with highWaterMark when creating a table in databricks - Stack Overflow

admin2025-04-26  4

I have a collection of parquet files in an ADLS container, and I want to create a table that points to the folder in the container.

Usually I do spark.sql(f"create table {schema}.{table} location '{container_path}/{folder}'") but in this case I'm getting error "Unable to infer schema".

If I do specify the schema

sql = f"""create table {schema}.{table} (
id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
nf INT
)
using delta
location '{container_path}/{folder}'"""
spark.sql(sql)

I instead get error

[DELTA_CREATE_TABLE_SCHEME_MISMATCH]

== Differences ==
- Specified metadata for field id is different from existing schema:
  Specified: {"delta.identity.start":1,"delta.identity.step":1,"delta.identity.allowExplicitInsert":false}
  Existing:  {"delta.identity.allowExplicitInsert":false,"delta.identity.start":1,"delta.identity.highWaterMark":1571,"delta.identity.step":1}

It seems the only difference is highWaterMark. Is there a way to specify it inside the creation query? On another angle, since clearly there are detailed schema information, why I'm getting the "unable to infer schema" error?

转载请注明原文地址:http://anycun.com/QandA/1745638116a91056.html