Skip to content

Commit e5d1bb8

Browse files
koenvoFokko
authored andcommitted
Temporary fix for filtering on empty batches (#1901)
Potential fix for #1804 Might want to write a test, but not sure yet how to reproduce without using glue. Closes #1804
1 parent e9745d5 commit e5d1bb8

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

pyiceberg/io/pyarrow.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1421,11 +1421,15 @@ def _task_to_record_batches(
14211421

14221422
# Apply the user filter
14231423
if pyarrow_filter is not None:
1424-
current_batch = current_batch.filter(pyarrow_filter)
1424+
# Temporary fix until PyArrow 21 is released ( https://github.com/apache/arrow/pull/46057 )
1425+
table = pa.Table.from_batches([current_batch])
1426+
table = table.filter(pyarrow_filter)
14251427
# skip empty batches
1426-
if current_batch.num_rows == 0:
1428+
if table.num_rows == 0:
14271429
continue
14281430

1431+
current_batch = table.combine_chunks().to_batches()[0]
1432+
14291433
result_batch = _to_requested_schema(
14301434
projected_schema,
14311435
file_project_schema,

0 commit comments

Comments
 (0)