Skip to content

Add hive configs for supported read and write formats #25147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pramodsatya
Copy link
Contributor

@pramodsatya pramodsatya commented May 20, 2025

Description

Adds hive configs hive.read-formats and hive.write-formats to configure the file formats supported by hive connector for read and write operations respectively.

Motivation and Context

Presto C++ only supports reading of tables with DWRF, ORC and PARQUET formats, and writing to tables with DWRF and PARQUET formats, with the hive connector. Using these hive configs will allow to fail-fast at coordinator when attempting to read from and write to tables with unsupported file formats in Presto C++.
Currently attempting to read from tables with unsupported file formats in Presto C++ fails at the worker:

it != readerFactories().end() ReaderFactory is not registered for format text

Release Notes

== RELEASE NOTES ==
Hive Connector Changes
* Adds hive configuration properties `hive.read-formats` and `hive.write-formats` to allow users to set file formats supported for read and write operations by hive connector.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 20, 2025
@pramodsatya pramodsatya marked this pull request as ready for review May 20, 2025 15:20
@pramodsatya pramodsatya requested a review from a team as a code owner May 20, 2025 15:20
@pramodsatya pramodsatya requested a review from jaystarshot May 20, 2025 15:20
@prestodb-ci prestodb-ci requested review from a team, sh-shamsan and pdabre12 and removed request for a team May 20, 2025 15:20
@pramodsatya pramodsatya requested review from tdcmeehan, aditi-pandit, a team and nishithakbhaskaran and removed request for sh-shamsan, pdabre12 and a team May 20, 2025 15:20
if (connectorSystemConfig.isNativeExecution()) {
StorageFormat storageFormat = table.getStorage().getStorageFormat();
Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat);
if (hiveStorageFormat.isPresent() && !(hiveStorageFormat.equals(Optional.of(DWRF))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this in the Hive configs. By default, it is empty, which means whatever is available in Hive is fine. It can be a set of comma separated values.

@aditi-pandit
Copy link
Contributor

@pramodsatya : Thanks for this code. Should we add a check for the file formats applicable at the Writer side as well ? Native execution only supports DWRF and Parquet writers.

@pramodsatya pramodsatya force-pushed the hive_rd_fmt branch 2 times, most recently from 6ddeb7c to 3f1bcac Compare May 26, 2025 21:43
@pramodsatya pramodsatya changed the title [native] Fail-fast for file formats unsupported by hive connector Add hive configs for supported read and write formats May 27, 2025
@pramodsatya
Copy link
Contributor Author

pramodsatya commented May 27, 2025

Thanks for the feedback @tdcmeehan, @aditi-pandit . Added hive configs for supported read and write formats, and validated read/write operations fail for unsupported formats when these configs are set. Could you please take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants