-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Add hive configs for supported read and write formats #25147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
if (connectorSystemConfig.isNativeExecution()) { | ||
StorageFormat storageFormat = table.getStorage().getStorageFormat(); | ||
Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat); | ||
if (hiveStorageFormat.isPresent() && !(hiveStorageFormat.equals(Optional.of(DWRF)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this in the Hive configs. By default, it is empty, which means whatever is available in Hive is fine. It can be a set of comma separated values.
@pramodsatya : Thanks for this code. Should we add a check for the file formats applicable at the Writer side as well ? Native execution only supports DWRF and Parquet writers. |
6ddeb7c
to
3f1bcac
Compare
Thanks for the feedback @tdcmeehan, @aditi-pandit . Added hive configs for supported read and write formats, and validated read/write operations fail for unsupported formats when these configs are set. Could you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya
@@ -250,6 +254,15 @@ public ConnectorSplitSource getSplits( | |||
session.getRuntimeStats()); | |||
Table table = layout.getTable(metastore, metastoreContext); | |||
|
|||
if (!readFormats.isEmpty()) { | |||
StorageFormat storageFormat = table.getStorage().getStorageFormat(); | |||
Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean to get an empty hiveStorageFormat ? Shouldn't we throw in that case as well.
@@ -660,7 +662,12 @@ public HiveSessionProperties(HiveClientConfig hiveClientConfig, OrcFileWriterCon | |||
NATIVE_STATS_BASED_FILTER_REORDER_DISABLED, | |||
"Native Execution only. Disable stats based filter reordering.", | |||
false, | |||
true)); | |||
true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the user modify this as a session property ? That shouldn't be allowed right ?
Should we have documentation for these new properties? https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties.rst |
Description
Adds hive configs
hive.read-formats
andhive.write-formats
to configure the file formats supported by hive connector for read and write operations respectively.Motivation and Context
Presto C++ only supports reading of tables with
DWRF
,ORC
andPARQUET
formats, and writing to tables withDWRF
andPARQUET
formats, with the hive connector. Using these hive configs will allow to fail-fast at coordinator when attempting to read from and write to tables with unsupported file formats in Presto C++.Currently attempting to read from tables with unsupported file formats in Presto C++ fails at the worker:
Release Notes