Skip to content

[BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name #584

@gwindes

Description

@gwindes

Apache Iceberg version

0.6.0 (latest release)

Please describe the bug 🐞

Related to: #81

Platform: MacOS M1
Python v3.12

pyiceberg 0.6.0
pyarrow 15.0.2
pandas 2.2.1

I believe this is a bug, but I may also be misunderstanding how pyiceberg and pyarrow are working with iceberg tables and thus I may be doing something wrong. However, when I sanitize the column name before writing the data to remove : . - / I'm able to query just fine.

My understanding is that the following iceberg column name is a valid name TEST:A1B2.RAW.ABC-GG-1-A. With the caveat that it is NOT a nested field (which I don't need). I'm able to write the data to the iceberg table and it shows the metadata with the fully qualified name of TEST:A1B2.RAW.ABC-GG-1-A in the metadata json.

It appears to be only fail when I want to read the data. I'm following the basic getting started in the pyiceberg

table = catalog.load_table("A1B2.A1-301")

# neither table scan works (throws the same error):
df_pyarrow = table.scan().to_arrow()
df_panda = table.scan().to_pandas()

I created a sample project that reproduces my problem with the pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA) in TEST:A1B2.RAW.ABC-GG-1-A: double error.

Also to clarify, these column names do need to be in this format as their format has a very specific use case within our hardware environments. We try to follow a URI style naming schema for our columns & sensors.

Image showing metadata is storing channel name as expected.
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions