Skip to content

[feature request] Allow Java Iceberg library to write parquet files with special character column names #10120

@kevinjqliu

Description

@kevinjqliu

Feature Request / Improvement

Based on discussions from iceberg-python/#584, we found that the Java Iceberg library "sanitizes" and transforms column names with special characters before writing to parquet.

For example, an Iceberg table with TEST:A1B2.RAW.ABC-GG-1-A column is transformed into TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA which is then used to write the parquet files.
This process is done for both reads and writes. The behavior was introduced in #601

I think Iceberg should (optionally) allow writing column names without the "sanitization" and transformation. This can be made configurable to enable backward compatibility.

Query engine

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionalitystale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions