Skip to content

Disable logical type cast of fastavro #34603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
2 changes: 2 additions & 0 deletions sdks/python/apache_beam/io/avroio.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ def __init__(
super().__init__()
self._source = _FastAvroSource(
file_pattern, min_bundle_size, validate=validate)
# Disable fastavro's automatic logical type conversion
fastavro.read.LOGICAL_READERS.clear()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will remove all avro LOGICAL_READERS in both the case where someone outputs as_row, or not meaning whatever the atomic value of the logical type is will be passed and all other info will be lost..

This might break for people who depend on the current LOGICAL_READERS behavior when as_row=False. If we decide to pass timsetamp-millis as integers, a more targeted fix would be to remove the LOGICAL_READER for 'long-timestamp-millis' only if as_row=True.

Also worth considering whether we want to treat the "timestamp-millis" as a Timestamp like mentioned in #31656 (comment).

@Abacn do you have an opinion whether the "timestamp-millis" type should be treated as an integer or a "Timestamp" type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agree, a generic suggestion is to avoid breaking change to users.

if as_rows:
path = FileSystems.match([file_pattern], [1])[0].metadata_list[0].path
with FileSystems.open(path) as fin:
Expand Down
Loading