Description
Apache Iceberg version
0.9.1 (latest release)
Please describe the bug 🐞
When attempting to add Parquet files to an Iceberg table using Table.add_files
, the operation fails if a column defined as DecimalType
in the Iceberg schema is physically stored as FIXED_LEN_BYTE_ARRAY
in the Parquet file, even if the decimal's precision would typically map to INT32
or INT64
according to Iceberg's preferred Parquet mapping.
I see in the Iceberg Spec that on-write the mapping is correct. However, the current behaviour seems to overly restrict the physical Parquet type for decimals during the file addition process. I believe this greatly limits the kinds of parquet files that can be "added" to an Iceberg table this way.
Steps to Reproduce:
- Define an Iceberg table schema with a
DecimalType
column, for example,Decimal(10, 2)
.- Iceberg's preferred Parquet physical type for
Decimal(10, 2)
would beINT64
.
- Iceberg's preferred Parquet physical type for
- Create a Parquet file where the corresponding column for this
Decimal(10, 2)
is physically stored asFIXED_LEN_BYTE_ARRAY
. The data itself is valid forDecimal(10, 2)
. - Attempt to add this Parquet file to the Iceberg table using
Table.add_files
.
Behavior:
The Table.add_files
operation fails, with the following error:
ValueError: Unexpected physical type FIXED_LEN_BYTE_ARRAY for DecimalType(10, 2) expected INT32
indicating a mismatch between the expected physical type (e.g., INT64
) and the actual physical type (FIXED_LEN_BYTE_ARRAY
) found in the Parquet file for the decimal column.
Expected Behavior:
The Table.add_files
operation should succeed and correctly read the decimal values from the FIXED_LEN_BYTE_ARRAY
physical storage. The Iceberg reader/writer should be lenient with the physical storage format of decimals OR otherwise Table.add_files
should document these limitations.
Environment:
- Python version: 3.12.9
- Parquet library and version: pyarrow 20.0.0
P.S. If this is just user error and I shouldn't be trying to do things this way I'd be happy to hear alternatives.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time