Skip to content

Add support for pyarrow DurationType #1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
0x26res opened this issue Apr 9, 2025 · 3 comments
Open

Add support for pyarrow DurationType #1900

0x26res opened this issue Apr 9, 2025 · 3 comments

Comments

@0x26res
Copy link

0x26res commented Apr 9, 2025

Feature Request / Improvement

Currently a pa.Schema with a pa.DurationType can't be converted to an iceberg schema.

I think it should be treated the same way as a pa.Time64Type and be mapped to a time type in iceberg.

import pyarrow as pa
import pytest
from pyiceberg.catalog import Catalog
from pyiceberg.io.pyarrow import UnsupportedPyArrowTypeException


def test_iceberg_config():
    pa_schema = pa.schema(
        [
            pa.field("timestamp", pa.timestamp("us", "UTC")),
            pa.field("time", pa.time64("us")),
            pa.field("duration", pa.duration("us")),
        ],
    )
    with pytest.raises(
        UnsupportedPyArrowTypeException,
        match=r"Column 'duration' has an unsupported type: duration\[us\]",
    ):
        Catalog._convert_schema_if_needed(pa_schema)
@Fokko
Copy link
Contributor

Fokko commented Apr 15, 2025

@0x26res Thanks for raising this issue. From what I understand, a duration is different from a time. Could you elaborate how this would map onto time?

@0x26res
Copy link
Author

0x26res commented Apr 15, 2025

I guess in python a datetime.timedelta (aka duration) is like a datetime.time, except a timedelta value can be negative and be greater than a day.

In pyarrow, there isn't this constraint. You can create a time64 that represent more than 24 hours or that is negative. In that respect duration and time64, in pyarrow, are both an int 64, which associated with its unit ("us", "ns"...) can be interpreted to a logical type.

The spec on the time in iceberg are a bit loose:

Time of day, microsecond precision, without date, timezone

I guess we can either:

  • have the library convert pa.duration64 to an iceberg time by default
  • force the user to convert their pa.duration('us') to pa.time64('us') before hand, if their happy to interpret their duration as time.
  • add support for an explicit duration type in iceberg.

@jayceslesar
Copy link
Contributor

This was just formally proposed to the dev mailing list via https://docs.google.com/document/d/12ghQxWxyAhSQeZyy0IWiwJ02gTqFOgfYm8x851HZFLk/edit?tab=t.0#heading=h.rt0cvesdzsj7

I think wise to wait for this to be officially implemented before attempting to stick it into the time type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants