-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verify flow #484
verify flow #484
Conversation
debermudez
commented
Mar 2, 2024
•
edited
Loading
edited
- Clean up
- link up llm_inputs
- link up llm_profile
- rename llm_profile to llm_metrics
- rename test_llm_profile to test_llm_metrics
- handle extra args after "--" in the command line
994ac1c
to
47e10fe
Compare
52f648c
to
432dc3c
Compare
"-i", | ||
type=str.lower, | ||
choices=["http", "grpc"], | ||
default="http", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a later decision if needed, but my 2cents from similar work is that using a default of GRPC+streaming will work out of box for all triton models when using the infer endpoint, so most users won't even need to think about setting this arg. If you wanted to use a decoupled model in triton on the infer endpoint (which seems like a sensible default for LLM tasks), then you need to use GRPC streaming anyway - so it would be better to not make the user have to set these args everytime for the "common case".
You can also consider selecting the endpoint default values based on the --service-kind
wherever it applies.
ex something like this:
triton-infer
-> default =grpc streaming
-localhost:8001
openai
-> default =http
-localhost:<common_openai_port>
triton-generate
-> default =http
(grpc not supported) -localhost:8000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something i needed to put more thought into so I am glad you are bringing it up.
I think going based on service-kind is a good method to handle this. Thank you for the suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will handle this in the next PR if that is okay with you.
# TODO: Once the OpenAI endpoint support is in place in PA core, | ||
# update the input-data option arg | ||
# cmd += f"--input-data {DEFAULT_INPUT_DATA_JSON} -p 10000 -s 99" | ||
cmd += f"--input-data ./input_data.json -p 10000 -s 99" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also uncomment this code now, and just change DEFAULT_INPUT_DATA_JSON=input_data.json
until the other filename workflow is ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will handle this in the next PR if that is okay with you.
@@ -97,3 +97,10 @@ def test_arguments_model_not_provided(self): | |||
def test_exception_on_nonzero_exit(self): | |||
with pytest.raises(GenAiPAException) as e: | |||
run(["-m", "nonexistent_model"]) | |||
|
|||
def test_pass_through_args(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does PA handle duplicate arguments? Which should take priority? There should probably be some tests for a handful of conflicting arguments and making sure they behave however you want it to.
genai-perf -m my_model --url localhost:8000 -- -u abc:123
genai-perf -m my_model -- -m your_model
genai-perf -m my_model --concurrency 10 -- --request-rate 5
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more extensive cli testing is my next task.
i need to find the bugs i dont know about.
that is certainly a test case that needs to be included.
endpoint_group = parser.add_argument_group("Endpoint") | ||
|
||
endpoint_group.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for later: PA core looks like it has --endpoint
for specifying the HTTP route for openai like /v1/chat/completions
and this tool uses --endpoint
as the same as --protocol
. We should probably align on names where it makes sense later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take this as an AI for my next PR.
This was has blown up in scope and I dont want to gate any other submissions too much longer.
Clean up and link up llm_inputs
baf41a5
to
49fb648
Compare
if extra_args: | ||
# strip off the "--" demarking the pass through arguments | ||
extra_args = extra_args[1:] | ||
logger.info(f"Additional pass through args: {extra_args}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be covered in follow-up, but I think this will be fragile. If a user makes a typo or something like:
genai-perf --typo my_model -- extra_arg_1
I think this will capture --typo
as the first arg here.
There are probably more practical ways to handle this, such as either:
- not having any separator, so all extra_args are valid
- or having a specific named separator arg to denote separation like
--passthrough
or something that can be checked throughargs.passthrough
We can talk more offline or in follow-up though, not a blocker for this PR.
(replace --typo
with --mode
or --endpoin
or --concurency
etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, most opens have been delegated to a follow-up PR.
Make sure to address the codeql unused import warning though, I can re-approve after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 🚀
* Clean up and link up llm_inputs Clean up and link up llm_inputs * Connect metrics module and silence transformer import warning * Rename test_llm_profile.py to test_metrics.py * Rename again to prefix files with llm * Remove ./ from constant path * Add comment for silencing and handle extra args * Handle pass through arguments * Parse extra args correctly and update testing * Add logging and fix some error checking * extra commit from rebase * Remove unused import
* Clean up and link up llm_inputs Clean up and link up llm_inputs * Connect metrics module and silence transformer import warning * Rename test_llm_profile.py to test_metrics.py * Rename again to prefix files with llm * Remove ./ from constant path * Add comment for silencing and handle extra args * Handle pass through arguments * Parse extra args correctly and update testing * Add logging and fix some error checking * extra commit from rebase * Remove unused import
* Clean up and link up llm_inputs Clean up and link up llm_inputs * Connect metrics module and silence transformer import warning * Rename test_llm_profile.py to test_metrics.py * Rename again to prefix files with llm * Remove ./ from constant path * Add comment for silencing and handle extra args * Handle pass through arguments * Parse extra args correctly and update testing * Add logging and fix some error checking * extra commit from rebase * Remove unused import