Skip to content

Support dynamic overwrite #1287

@kevinjqliu

Description

@kevinjqliu

Feature Request / Improvement

Currently overwrite consists of a delete + append operation.

self.delete(delete_filter=overwrite_filter, snapshot_properties=snapshot_properties)
with self.update_snapshot(snapshot_properties=snapshot_properties).fast_append() as update_snapshot:
# skip writing data files if the dataframe is empty
if df.shape[0] > 0:
data_files = _dataframe_to_data_files(
table_metadata=self.table_metadata, write_uuid=update_snapshot.commit_uuid, df=df, io=self._table.io
)
for data_file in data_files:
update_snapshot.append_data_file(data_file)

As an optimization, we can support dynamic overwrite for when an entire partition is replaced.

Heres an example from @koenvo
https://gist.github.com/koenvo/e23bfab32c7e7810eb52f82c6304fc22

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions