-
Notifications
You must be signed in to change notification settings - Fork 418
Closed
Description
Apache Iceberg version
main (development)
Please describe the bug 🐞
According to the parquet data type mappings spec. DecimalType should map to INT32 when precision <= 9, INT64 when precision <= 18, and fixed otherwise.
However, currently arrow write all decimal type as fixed in parquet. This may not be a big issue since the logical type is correct and may require upstream support:
Updated: Thanks @syun64 for providing the link of upstream PR that fix this
Simple test:
from pyiceberg.catalog import load_catalog
from pyiceberg.types import *
from pyiceberg.schema import *
import pyarrow as pa
rest_catalog = load_catalog(
"rest",
**{
...
},
)
decimal_schema = Schema(NestedField(1, "decimal", DecimalType(7, 0)))
decimal_arrow_schema = pa.schema(
[
("decimal", pa.decimal128(7, 0)),
]
)
decimal_arrow_table = pa.Table.from_pylist(
[
{
"decimal": 123,
}
],
schema=decimal_arrow_schema,
)
tbl = rest_catalog.create_table(
"pyiceberg_test.test_decimal_type", schema=decimal_arrow_schema
)
tbl.append(decimal_arrow_table)> parquet-tools inspect 00000-0-bff20a80-0e80-4b53-ba35-2c94498fa507.parquet
############ file meta data ############
created_by: parquet-cpp-arrow version 16.1.0
num_columns: 1
num_rows: 1
num_row_groups: 1
format_version: 2.6
serialized_size: 465
############ Columns ############
decimal
############ Column(decimal) ############
name: decimal
path: decimal
max_definition_level: 1
max_repetition_level: 0
physical_type: FIXED_LEN_BYTE_ARRAY
logical_type: Decimal(precision=7, scale=0)
converted_type (legacy): DECIMAL
compression: ZSTD (space_saved: -25%)
sungwy and Fokko
Metadata
Metadata
Assignees
Labels
No labels