-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Apache Iceberg version
1.8.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
MoR delete with positional delete file does not properly update the total-records in Snapshot summary.
This can be seen by the pyiceberg example here where a single row is deleted but the total-records remains the same.
CoW delete, where the data file is rewritten, does not have this problem and the total-records is properly decremented, as shown here (Although its decremented using the previously wrongly calculated total-records).
I think this issue has persisted for quite a while. I found both #7463 and #6709.
#7463 shows that the delete (DELETE FROM default.t1 WHERE foo = 'b') produce an OVERWRITE snapshot with the following summary:
{
"spark.app.id": "local-1682689536619",
"changed-partition-count": "1",
"added-position-deletes": "1",
"total-equality-deletes": "0",
"total-position-deletes": "1",
"added-position-delete-files": "1",
"added-files-size": "1490",
"total-delete-files": "1",
"added-delete-files": "1",
"total-files-size": "2387",
"total-records": "3",
"total-data-files": "1"
}where 'total-records': '3', is the same as the previous Snapshot even though a row has been deleted
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time