-
Notifications
You must be signed in to change notification settings - Fork 415
Description
Apache Iceberg version
0.6.0 (latest release)
Please describe the bug 🐞
Related to: #81
Platform: MacOS M1
Python v3.12
pyiceberg 0.6.0
pyarrow 15.0.2
pandas 2.2.1
I believe this is a bug, but I may also be misunderstanding how pyiceberg and pyarrow are working with iceberg tables and thus I may be doing something wrong. However, when I sanitize the column name before writing the data to remove : . - / I'm able to query just fine.
My understanding is that the following iceberg column name is a valid name TEST:A1B2.RAW.ABC-GG-1-A. With the caveat that it is NOT a nested field (which I don't need). I'm able to write the data to the iceberg table and it shows the metadata with the fully qualified name of TEST:A1B2.RAW.ABC-GG-1-A in the metadata json.
It appears to be only fail when I want to read the data. I'm following the basic getting started in the pyiceberg
table = catalog.load_table("A1B2.A1-301")
# neither table scan works (throws the same error):
df_pyarrow = table.scan().to_arrow()
df_panda = table.scan().to_pandas()
I created a sample project that reproduces my problem with the pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA) in TEST:A1B2.RAW.ABC-GG-1-A: double error.
Also to clarify, these column names do need to be in this format as their format has a very specific use case within our hardware environments. We try to follow a URI style naming schema for our columns & sensors.
