Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify flow #484

Merged
merged 11 commits into from
Mar 6, 2024
Merged

verify flow #484

merged 11 commits into from
Mar 6, 2024

Conversation

debermudez
Copy link
Contributor

@debermudez debermudez commented Mar 2, 2024

  • Clean up
  • link up llm_inputs
  • link up llm_profile
  • rename llm_profile to llm_metrics
  • rename test_llm_profile to test_llm_metrics
  • handle extra args after "--" in the command line

@debermudez debermudez force-pushed the dbermudez-verify-flow branch 2 times, most recently from 994ac1c to 47e10fe Compare March 5, 2024 01:00
@debermudez debermudez marked this pull request as ready for review March 5, 2024 01:01
@debermudez debermudez force-pushed the dbermudez-verify-flow branch from 52f648c to 432dc3c Compare March 5, 2024 01:20
"-i",
type=str.lower,
choices=["http", "grpc"],
default="http",
Copy link
Contributor

@rmccorm4 rmccorm4 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a later decision if needed, but my 2cents from similar work is that using a default of GRPC+streaming will work out of box for all triton models when using the infer endpoint, so most users won't even need to think about setting this arg. If you wanted to use a decoupled model in triton on the infer endpoint (which seems like a sensible default for LLM tasks), then you need to use GRPC streaming anyway - so it would be better to not make the user have to set these args everytime for the "common case".

You can also consider selecting the endpoint default values based on the --service-kind wherever it applies.

ex something like this:

  • triton-infer -> default = grpc streaming - localhost:8001
  • openai -> default = http - localhost:<common_openai_port>
  • triton-generate -> default = http (grpc not supported) - localhost:8000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something i needed to put more thought into so I am glad you are bringing it up.
I think going based on service-kind is a good method to handle this. Thank you for the suggestion

Copy link
Contributor Author

@debermudez debermudez Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle this in the next PR if that is okay with you.

Comment on lines +59 to +66
# TODO: Once the OpenAI endpoint support is in place in PA core,
# update the input-data option arg
# cmd += f"--input-data {DEFAULT_INPUT_DATA_JSON} -p 10000 -s 99"
cmd += f"--input-data ./input_data.json -p 10000 -s 99"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also uncomment this code now, and just change DEFAULT_INPUT_DATA_JSON=input_data.json until the other filename workflow is ready

Copy link
Contributor Author

@debermudez debermudez Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle this in the next PR if that is okay with you.

@debermudez debermudez requested a review from rmccorm4 March 5, 2024 18:37
@@ -97,3 +97,10 @@ def test_arguments_model_not_provided(self):
def test_exception_on_nonzero_exit(self):
with pytest.raises(GenAiPAException) as e:
run(["-m", "nonexistent_model"])

def test_pass_through_args(self):
Copy link
Contributor

@rmccorm4 rmccorm4 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does PA handle duplicate arguments? Which should take priority? There should probably be some tests for a handful of conflicting arguments and making sure they behave however you want it to.

genai-perf -m my_model --url localhost:8000 -- -u abc:123
genai-perf -m my_model -- -m your_model
genai-perf -m my_model --concurrency 10 -- --request-rate 5
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more extensive cli testing is my next task.
i need to find the bugs i dont know about.
that is certainly a test case that needs to be included.

endpoint_group = parser.add_argument_group("Endpoint")

endpoint_group.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for later: PA core looks like it has --endpoint for specifying the HTTP route for openai like /v1/chat/completions

and this tool uses --endpoint as the same as --protocol. We should probably align on names where it makes sense later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take this as an AI for my next PR.
This was has blown up in scope and I dont want to gate any other submissions too much longer.

@debermudez debermudez force-pushed the dbermudez-verify-flow branch from baf41a5 to 49fb648 Compare March 6, 2024 00:21
Comment on lines +224 to +227
if extra_args:
# strip off the "--" demarking the pass through arguments
extra_args = extra_args[1:]
logger.info(f"Additional pass through args: {extra_args}")
Copy link
Contributor

@rmccorm4 rmccorm4 Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be covered in follow-up, but I think this will be fragile. If a user makes a typo or something like:

genai-perf --typo my_model -- extra_arg_1

I think this will capture --typo as the first arg here.

There are probably more practical ways to handle this, such as either:

  • not having any separator, so all extra_args are valid
  • or having a specific named separator arg to denote separation like --passthrough or something that can be checked through args.passthrough

We can talk more offline or in follow-up though, not a blocker for this PR.

(replace --typo with --mode or --endpoin or --concurency etc)

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, most opens have been delegated to a follow-up PR.

Make sure to address the codeql unused import warning though, I can re-approve after.

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🚀

@debermudez debermudez merged commit e701b84 into feature-genai-pa Mar 6, 2024
3 checks passed
@debermudez debermudez deleted the dbermudez-verify-flow branch March 6, 2024 05:17
debermudez added a commit that referenced this pull request Mar 12, 2024
* Clean up and link up llm_inputs

Clean up and link up llm_inputs

* Connect metrics module and silence transformer import warning

* Rename test_llm_profile.py to test_metrics.py

* Rename again to prefix files with llm

* Remove ./ from constant path

* Add comment for silencing and handle extra args

* Handle pass through arguments

* Parse extra args correctly and update testing

* Add logging and fix some error checking

* extra commit from rebase

* Remove unused import
debermudez added a commit that referenced this pull request Mar 13, 2024
* Clean up and link up llm_inputs

Clean up and link up llm_inputs

* Connect metrics module and silence transformer import warning

* Rename test_llm_profile.py to test_metrics.py

* Rename again to prefix files with llm

* Remove ./ from constant path

* Add comment for silencing and handle extra args

* Handle pass through arguments

* Parse extra args correctly and update testing

* Add logging and fix some error checking

* extra commit from rebase

* Remove unused import
mc-nv pushed a commit that referenced this pull request Mar 13, 2024
* Clean up and link up llm_inputs

Clean up and link up llm_inputs

* Connect metrics module and silence transformer import warning

* Rename test_llm_profile.py to test_metrics.py

* Rename again to prefix files with llm

* Remove ./ from constant path

* Add comment for silencing and handle extra args

* Handle pass through arguments

* Parse extra args correctly and update testing

* Add logging and fix some error checking

* extra commit from rebase

* Remove unused import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants