Skip to content

[llm] Add a generic text only LLM runner #11343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

larryliu0820
Copy link
Contributor

Introducing text_llm_runner. This can be used to run all text only decoder only LLM models supported by ExecuTorch.

  • Metadata is being read out from the .pte file and being used to construct the runner object.
  • examples/models/llama/runner.h[.cpp] only contains a simple wrapper around text_llm_runner.h[.cpp].

In next PRs I will move examples/models/phi-3-mini/runner to use the generic runner.

Will look into QNN and MediaTek runners as well.

Differential Revision: D75910889

ghstack-source-id: 288004703
Pull Request resolved: #11342

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Copy link

pytorch-bot bot commented Jun 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11343

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

If you must merge, use @pytorchbot merge -f.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2025
@larryliu0820 larryliu0820 added release notes: llm To capture llm specific changes in release notes and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Jun 3, 2025
@larryliu0820 larryliu0820 force-pushed the gh/larryliu0820/65/orig branch from 4393dd1 to dc4de71 Compare June 3, 2025 23:21
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2025
@mergennachin mergennachin self-requested a review June 4, 2025 00:14
Copy link
Contributor

@mergennachin mergennachin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline

@larryliu0820 larryliu0820 force-pushed the gh/larryliu0820/65/orig branch 2 times, most recently from 4070bba to 85fb703 Compare June 5, 2025 06:00
@larryliu0820 larryliu0820 requested a review from jathu as a code owner June 5, 2025 06:00
Copy link
Contributor

@mergennachin mergennachin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline @larryliu0820 and thank you!

*/
ET_EXPERIMENTAL std::unique_ptr<tokenizers::Tokenizer> load_tokenizer(
const std::string& tokenizer_path,
std::unique_ptr<std::vector<std::string>> special_tokens = nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using smart pointer, how about this?

const std::optional<std::vector<std::string>>& tokens

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way, you're passing by const reference and not using any pointers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that trigger a copy? I think the tokenizer will have to hold the memory of these tokens.

std::unique_ptr<TextPrefiller> text_prefiller,
std::unique_ptr<TextTokenGenerator> text_token_generator,
std::unique_ptr<Stats> stats,
float temperature = -1.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's deprecated, why include this as part of the API?

If there's a user code, they can always just use create_text_llm_runner, or can we migrate the user code to use GenerationConfig directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for JNI where it doesn't expose a temperature argument in generate() API. I think @kirklandsign is going to refactor JNI API (and demo app) to fully deprecate temperature to the constructor.

text_token_generator,
std::unique_ptr<::executorch::extension::llm::Stats> stats,
float temperature = -1.0f);
std::unique_ptr<llm::TextLLMRunner> create_llama_runner(
Copy link
Contributor

@mergennachin mergennachin Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should keep example::create_llama_runner method at all?

should we just replace with llm::load_tokenizer and llm::create_llm_runner for all the current callsites of example::create_llama_runner

It seems unnecessary indirection, and we would like to showcase these low-level APIs in our examples, as opposed to having yet another indirection (i.e., example::create_llama_runner)

i'm fine if you want to this in a follow-up PR, but I think we should do it very soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah don't have a strong opinion on this. I can do a follow up.

@larryliu0820 larryliu0820 force-pushed the gh/larryliu0820/65/orig branch from 85fb703 to f721f3f Compare June 5, 2025 23:44
Pull Request resolved: #11342

Introducing `text_llm_runner`. This can be used to run all text only decoder only LLM models supported by ExecuTorch.

* Metadata is being read out from the .pte file and being used to construct the runner object.
* examples/models/llama/runner.h[.cpp] only contains a simple wrapper around `text_llm_runner.h[.cpp]`.


In next PRs I will move examples/models/phi-3-mini/runner to use the generic runner.

Will look into QNN and MediaTek runners as well.
ghstack-source-id: 288571542
@exported-using-ghexport

Differential Revision: [D75910889](https://our.internmc.facebook.com/intern/diff/D75910889/)
@larryliu0820 larryliu0820 force-pushed the gh/larryliu0820/65/orig branch from f721f3f to 3fda3a4 Compare June 5, 2025 23:56
@larryliu0820
Copy link
Contributor Author

Close, duplicate of #11342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: llm To capture llm specific changes in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants