Add hive configs for supported read and write formats #25147

pramodsatya · 2025-05-20T04:24:42Z

Description

Adds hive configs hive.read-formats and hive.write-formats to configure the file formats supported by hive connector for read and write operations respectively.

Motivation and Context

Presto C++ only supports reading of tables with DWRF, ORC and PARQUET formats, and writing to tables with DWRF and PARQUET formats, with the hive connector. Using these hive configs will allow to fail-fast at coordinator when attempting to read from and write to tables with unsupported file formats in Presto C++.
Currently attempting to read from tables with unsupported file formats in Presto C++ fails at the worker:

it != readerFactories().end() ReaderFactory is not registered for format text

Release Notes

== RELEASE NOTES ==
Hive Connector Changes
* Adds hive configuration properties `hive.read-formats` and `hive.write-formats` to allow users to set file formats supported for read and write operations by hive connector.

tdcmeehan · 2025-05-20T18:13:25Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

+        if (connectorSystemConfig.isNativeExecution()) {
+            StorageFormat storageFormat = table.getStorage().getStorageFormat();
+            Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat);
+            if (hiveStorageFormat.isPresent() && !(hiveStorageFormat.equals(Optional.of(DWRF))


Let's add this in the Hive configs. By default, it is empty, which means whatever is available in Hive is fine. It can be a set of comma separated values.

aditi-pandit · 2025-05-20T20:51:00Z

@pramodsatya : Thanks for this code. Should we add a check for the file formats applicable at the Writer side as well ? Native execution only supports DWRF and Parquet writers.

pramodsatya · 2025-05-27T02:47:47Z

Thanks for the feedback @tdcmeehan, @aditi-pandit . Added hive configs for supported read and write formats, and validated read/write operations fail for unsupported formats when these configs are set. Could you please take another look?

aditi-pandit

Thanks @pramodsatya

aditi-pandit · 2025-05-29T21:27:27Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

@@ -250,6 +254,15 @@ public ConnectorSplitSource getSplits(
                session.getRuntimeStats());
        Table table = layout.getTable(metastore, metastoreContext);

+        if (!readFormats.isEmpty()) {
+            StorageFormat storageFormat = table.getStorage().getStorageFormat();
+            Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat);


What does it mean to get an empty hiveStorageFormat ? Shouldn't we throw in that case as well.

aditi-pandit · 2025-05-29T21:29:43Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java

@@ -660,7 +662,12 @@ public HiveSessionProperties(HiveClientConfig hiveClientConfig, OrcFileWriterCon
                        NATIVE_STATS_BASED_FILTER_REORDER_DISABLED,
                        "Native Execution only. Disable stats based filter reordering.",
                        false,
-                        true));
+                        true),


Can the user modify this as a session property ? That shouldn't be allowed right ?

steveburnett · 2025-05-30T14:45:31Z

Should we have documentation for these new properties? https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties.rst

prestodb-ci added the from:IBM PR from IBM label May 20, 2025

pramodsatya marked this pull request as ready for review May 20, 2025 15:20

pramodsatya requested a review from a team as a code owner May 20, 2025 15:20

pramodsatya requested a review from jaystarshot May 20, 2025 15:20

prestodb-ci requested review from a team, sh-shamsan and pdabre12 and removed request for a team May 20, 2025 15:20

pramodsatya requested review from tdcmeehan, aditi-pandit, a team and nishithakbhaskaran and removed request for sh-shamsan, pdabre12 and a team May 20, 2025 15:20

tdcmeehan reviewed May 20, 2025

View reviewed changes

pramodsatya force-pushed the hive_rd_fmt branch 2 times, most recently from 6ddeb7c to 3f1bcac Compare May 26, 2025 21:43

Add hive configs for supported read and write formats

7f731d8

pramodsatya force-pushed the hive_rd_fmt branch from 3f1bcac to 7f731d8 Compare May 26, 2025 23:28

pramodsatya changed the title ~~[native] Fail-fast for file formats unsupported by hive connector~~ Add hive configs for supported read and write formats May 27, 2025

aditi-pandit reviewed May 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add hive configs for supported read and write formats #25147

Add hive configs for supported read and write formats #25147

Uh oh!

pramodsatya commented May 20, 2025 •

edited

Loading

Uh oh!

tdcmeehan May 20, 2025

Uh oh!

aditi-pandit commented May 20, 2025

Uh oh!

pramodsatya commented May 27, 2025 •

edited

Loading

Uh oh!

aditi-pandit left a comment

Uh oh!

aditi-pandit May 29, 2025

Uh oh!

aditi-pandit May 29, 2025

Uh oh!

steveburnett commented May 30, 2025

Uh oh!

Uh oh!

Add hive configs for supported read and write formats #25147

Are you sure you want to change the base?

Add hive configs for supported read and write formats #25147

Uh oh!

Conversation

pramodsatya commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Release Notes

Uh oh!

tdcmeehan May 20, 2025

Choose a reason for hiding this comment

Uh oh!

aditi-pandit commented May 20, 2025

Uh oh!

pramodsatya commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditi-pandit left a comment

Choose a reason for hiding this comment

Uh oh!

aditi-pandit May 29, 2025

Choose a reason for hiding this comment

Uh oh!

aditi-pandit May 29, 2025

Choose a reason for hiding this comment

Uh oh!

steveburnett commented May 30, 2025

Uh oh!

Uh oh!

pramodsatya commented May 20, 2025 •

edited

Loading

pramodsatya commented May 27, 2025 •

edited

Loading