I have a collection of parquet files in an ADLS container, and I want to create a table that points to the folder in the container.
Usually I do
spark.sql(f"create table {schema}.{table} location '{container_path}/{folder}'")
but in this case I'm getting error "Unable to infer schema".
If I do specify the schema
sql = f"""create table {schema}.{table} (
id BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
nf INT
)
using delta
location '{container_path}/{folder}'"""
spark.sql(sql)
I instead get error
[DELTA_CREATE_TABLE_SCHEME_MISMATCH]
== Differences ==
- Specified metadata for field id is different from existing schema:
Specified: {"delta.identity.start":1,"delta.identity.step":1,"delta.identity.allowExplicitInsert":false}
Existing: {"delta.identity.allowExplicitInsert":false,"delta.identity.start":1,"delta.identity.highWaterMark":1571,"delta.identity.step":1}
It seems the only difference is highWaterMark.
Is there a way to specify it inside the creation query?
On another angle, since clearly there are detailed schema information, why I'm getting the "unable to infer schema" error?