I have a delta table with 2 versions:
Add txn: path = "a.parquet" numRecords = 10 deletionVector = null
Add txn: path = "a.parquet" numRecords = 10 deletionVector = (..., cardinality = 2)
Please note both transactions point to the same physical path ("a.parquet"), without any remove transaction.
From my understanding of the delta protocol, since the above are 2 separate logical files residing in two different versions, the above describes a legal delta table that when queried, should return 18 rows.
Could you please confirm my understanding?
Testing on databricks, select () and count() seem to be inconsistent. Select () returns 18 rows, while count() result is 8.
I have a delta table with 2 versions:
Add txn: path = "a.parquet" numRecords = 10 deletionVector = null
Add txn: path = "a.parquet" numRecords = 10 deletionVector = (..., cardinality = 2)
Please note both transactions point to the same physical path ("a.parquet"), without any remove transaction.
From my understanding of the delta protocol, since the above are 2 separate logical files residing in two different versions, the above describes a legal delta table that when queried, should return 18 rows.
Could you please confirm my understanding?
Testing on databricks, select () and count() seem to be inconsistent. Select () returns 18 rows, while count() result is 8.
Usually when you do delete
operation on table with enableDeletionVectors
enabled it creates a logical transaction with actions add
and remove
in a single version itself, please check the Json file for remove
action and match the rows.
You were saying 10 records added initial and in next version you find add
transaction with deletion vector cardinality 2 and number of records 10,
so total 18 rows will be the output you are thinking, but there may be situation where the rows are not inserted but only updated, you need to check inserted
,updated
and deleted
rows properly.
Another reason for different rows, it is possible because you are accessing the data at different delta versions, kindly check it.