Skip to content

[Spec][Upstream] Mapping from DecimalType to Parquet physical type not aligned with spec #936

@HonahX

Description

@HonahX

Apache Iceberg version

main (development)

Please describe the bug 🐞

According to the parquet data type mappings spec. DecimalType should map to INT32 when precision <= 9, INT64 when precision <= 18, and fixed otherwise.

However, currently arrow write all decimal type as fixed in parquet. This may not be a big issue since the logical type is correct and may require upstream support:

Updated: Thanks @syun64 for providing the link of upstream PR that fix this

Simple test:

from pyiceberg.catalog import load_catalog
from pyiceberg.types import *
from pyiceberg.schema import *
import pyarrow as pa

rest_catalog = load_catalog(
    "rest",
    **{
        ...
    },
)


decimal_schema = Schema(NestedField(1, "decimal", DecimalType(7, 0)))
decimal_arrow_schema = pa.schema(
    [
        ("decimal", pa.decimal128(7, 0)),
    ]
)

decimal_arrow_table = pa.Table.from_pylist(
    [
        {
            "decimal": 123,
        }
    ],
    schema=decimal_arrow_schema,
)

tbl = rest_catalog.create_table(
    "pyiceberg_test.test_decimal_type", schema=decimal_arrow_schema
)

tbl.append(decimal_arrow_table)
> parquet-tools inspect 00000-0-bff20a80-0e80-4b53-ba35-2c94498fa507.parquet

############ file meta data ############
created_by: parquet-cpp-arrow version 16.1.0
num_columns: 1
num_rows: 1
num_row_groups: 1
format_version: 2.6
serialized_size: 465


############ Columns ############
decimal

############ Column(decimal) ############
name: decimal
path: decimal
max_definition_level: 1
max_repetition_level: 0
physical_type: FIXED_LEN_BYTE_ARRAY
logical_type: Decimal(precision=7, scale=0)
converted_type (legacy): DECIMAL
compression: ZSTD (space_saved: -25%)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions