Skip to content

Generate ProtocolToThrift files through make and commit #25162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vhsu14
Copy link
Contributor

@vhsu14 vhsu14 commented May 20, 2025

Description

In #25079, ProtocolToThrift files were manually written. This PR generates these files using pythons scripts and chevron templates.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

@vhsu14 vhsu14 requested a review from a team as a code owner May 20, 2025 21:53
@facebook-github-bot
Copy link
Collaborator

@vhsu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch from d505a2f to dd4a16c Compare May 21, 2025 00:12
@tdcmeehan
Copy link
Contributor

If we're using Thrift, why can't we just use either the Drift IDL generator, or have both Java and C++ auto-generate the Thrift structures from a checked in IDL?

@vhsu14
Copy link
Contributor Author

vhsu14 commented May 21, 2025

If we're using Thrift, why can't we just use either the Drift IDL generator, or have both Java and C++ auto-generate the Thrift structures from a checked in IDL?

We are using Drift to annotate classes in Java and generate an IDL for C++ to auto-gen the Thrift structures. Using a checked-in IDL has some cons in Java:

  1. While we are in the process of switching, we need to maintain two sets of classes and write toThrift/ fromThrift to convert the objects. This requires huge amounts of effort, can be prone to mistakes, and causes GC.
  2. Thrift structures don't support having class methods, so for every class that has methods, we will need to have a utils class for it.
  3. Everyone needs to be aware of the checked-in IDL. Whenever we want to modify classes in the Thrift flow, we need to make corresponding changes to the IDL.

Thanks for the comment. Let me know if you have further questions.

@tdcmeehan
Copy link
Contributor

Can we discuss the approach for code generation and overall timeline in this RFC? The protocol infrastructure has its own overhead and technical debt, and it's not really clear to me how this is going to work.

For example, some questions I have:

  • Does the existing readme in the Thrift directory apply to the current changes?
  • What is the timeline for maintaining both JSON and Thrift, or do we plan to maintain both?
  • Is the current Thrift file manually generated? Is this transitional, or going forward would it be a part of the developer workflow to regenerate the Thrift IDL using the Drift tool?

We use RFCs to get alignment on design and also to document decisions made. These are all important decisions that needs to be memorialized into the above RFC.

@shangm2
Copy link
Contributor

shangm2 commented May 21, 2025

Can we discuss the approach for code generation and overall timeline in this RFC? The protocol infrastructure has its own overhead and technical debt, and it's not really clear to me how this is going to work.

For example, some questions I have:

  • Does the existing readme in the Thrift directory apply to the current changes?
  • What is the timeline for maintaining both JSON and Thrift, or do we plan to maintain both?
  • Is the current Thrift file manually generated? Is this transitional, or going forward would it be a part of the developer workflow to regenerate the Thrift IDL using the Drift tool?

We use RFCs to get alignment on design and also to document decisions made. These are all important decisions that needs to be memorialized into the above RFC.

Hey @tdcmeehan . Let me know if the following answers your questions. And yes, we will incorporate these questions and answers into the RFC as well.

  • Does the existing readme in the Thrift directory apply to the current changes?
    Answer: if you are referring to the pipeline of converting thrift to json with the help of python script and yaml file, then yes, we still need this pipeline to generate some code, e.g. those in presto-native-execution/presto_cpp/main/thrift/ProtocolToThrift.cpp so that we dont need to write them manually.

  • What is the timeline for maintaining both JSON and Thrift, or do we plan to maintain both?
    Answer: I don't think we have a timeline to deprecate json at least for now. But we can discuss if/how we can do this once the thrift migration is in a stable state.

  • Is the current Thrift file manually generated? Is this transitional, or going forward would it be a part of the developer workflow to regenerate the Thrift IDL using the Drift tool?
    Answer: the idl file is auto generated using drift library. It is achieved when the presto-thrift-spec gets built which happens everything when you build presto. Here are two related prs:

# See the License for the specific language governing permissions and
# limitations under the License.

StructMap:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructMap maps fields that have different names for protocol and thrift classes

fields:
infoUnion: { field_name: info }

SkipStruct:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't generate toThrift/fromThrift classes for the following structs because they are related to OperatorInfo, which is an empty struct in C++

StageExecutionId:
StageId:

WrapperStruct:
Copy link
Contributor Author

@vhsu14 vhsu14 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are single field structs in IDL that are directly declared in C++. For example,

In IDL:
struct ConnectorId {
1: string catalogName;
}

In C++:
using ConnectorId = std::string;

RuntimeStats:
SqlFunctionId:

Special:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are structs that require custom conversions, which are defined under the /special folder

# See the License for the specific language governing permissions and
# limitations under the License.

all: ProtocolToThrift.h ProtocolToThrift.cpp
Copy link
Contributor Author

@vhsu14 vhsu14 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Makefile defines the flow for generating ProtocolToThrift files as defined in README.md. For now, it depends on presto_thrift.thrift, which is a copy of Drift generated IDL. @shangm2 is working on having it auto-generated as part of presto-thrift-spec and we will reference to the generated IDL once that is done.

@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch from 2a99cd0 to 7ec0dac Compare May 21, 2025 18:49
@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch from 7ec0dac to fd74671 Compare May 21, 2025 20:58
@facebook-github-bot
Copy link
Collaborator

@vhsu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@shangm2 shangm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM



def main():
preprocess(sys.argv[1], "temp_presto_thrift.thrift")
Copy link
Contributor

@shangm2 shangm2 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to remove the "drift.recursive_reference=true" annotation from the original thrift file. This annotation is needed/generated by drift codec and we prefer not to write hacky code in drift repo in order to remove it. @vhsu14 maybe we can add some comment to emphasize this for future reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems little hacky.. can we add a comment here explaining it.

./presto_protocol-to-thrift-json.py presto_thrift.json ../../presto_protocol/core/presto_protocol_core.json | jq . > presto_protocol-to-thrift-json.json

presto_thrift.json: presto_thrift.thrift ./thrift2json.py
./thrift2json.py presto_thrift.thrift | jq . > presto_thrift.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .thrift file is at presto-thrift-spec/target/thrift/presto-thrift-protocol.thrift



if __name__ == "__main__":
sys.exit(main())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add new line at end of file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants