-
Notifications
You must be signed in to change notification settings - Fork 415
Description
Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Hi,
Not sure if this is a bug but worst case scenario this might be something for other to look up into in the future.
I've created a table like follows using pyiceberg
schema = Schema(
NestedField(field_id=1, name="bk_id", field_type=StringType(), required=False),
NestedField(field_id=2, name="inference_date", field_type=TimestampType(), required=False),
NestedField(field_id=3, name="verified", field_type=BooleanType(), required=False),
NestedField(field_id=4, name="id", field_type=StringType(), required=True),
)I've been able to do multiple appends to the table using pyiceberg with no issues.
Now, to run some tests and prepare to use the new upsert operation, I decided do append a row with id = 'dummy_id', and then run a scan filtering by it. When I do the scan through AWS Athena I see the row, however, when doing the scan with dummy = table.scan(row_filter=EqualTo("id", 'dummy_id')) I get list index out of range. This seems to be because pyiceberg isn't able to retrieve the row.
Here is the code I have setup to replicate the issue:
from pyiceberg.expressions import EqualTo
import pyarrow as pa
df = pa.Table.from_pydict(
{
"bk_id": ["BK123456"],
"inference_date": [pd.Timestamp.now()],
"verified": [False],
"id": ["dummy_id"],
}
)
catalog = load_catalog(
"glue",
**{
"type": "glue",
"warehouse": warehouse_path,
"downcast-ns-timestamp-to-us-on-write": True,
},
)
table_identifier = "database_name.table_name"
table = catalog.load_table(table_identifier)
table.append(df)
dummy = table.scan(row_filter=EqualTo("id", 'dummy_id'))
dummy.to_arrow()Is there something I'm doing wrong?
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time