Skip to content

Glue scan with filter throws list index out of range #1804

@Cabeda

Description

@Cabeda

Apache Iceberg version

0.9.0 (latest release)

Please describe the bug 🐞

Hi,

Not sure if this is a bug but worst case scenario this might be something for other to look up into in the future.

I've created a table like follows using pyiceberg

            schema = Schema(
                NestedField(field_id=1, name="bk_id", field_type=StringType(), required=False),
                NestedField(field_id=2, name="inference_date", field_type=TimestampType(), required=False),
                NestedField(field_id=3, name="verified", field_type=BooleanType(), required=False),
                NestedField(field_id=4, name="id", field_type=StringType(), required=True),
            )

I've been able to do multiple appends to the table using pyiceberg with no issues.

Now, to run some tests and prepare to use the new upsert operation, I decided do append a row with id = 'dummy_id', and then run a scan filtering by it. When I do the scan through AWS Athena I see the row, however, when doing the scan with dummy = table.scan(row_filter=EqualTo("id", 'dummy_id')) I get list index out of range. This seems to be because pyiceberg isn't able to retrieve the row.

Here is the code I have setup to replicate the issue:

from pyiceberg.expressions import EqualTo
import pyarrow as pa

df = pa.Table.from_pydict(
        {
            "bk_id": ["BK123456"],
            "inference_date": [pd.Timestamp.now()],
            "verified": [False],
            "id": ["dummy_id"],
        }
    )


catalog = load_catalog(
        "glue",
        **{
            "type": "glue",
            "warehouse": warehouse_path,
            "downcast-ns-timestamp-to-us-on-write": True,
        },
    )

table_identifier = "database_name.table_name"
table = catalog.load_table(table_identifier)


table.append(df)


dummy = table.scan(row_filter=EqualTo("id", 'dummy_id'))
dummy.to_arrow()

Is there something I'm doing wrong?

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions