-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sagemaker client issue #53
Comments
Hey @SuchethaChintha did you fix that ? |
This is probably due to using an older version of the Sagemaker SDK. Updating it should fix the issue. |
It seems that this error occurs because there's an inconsistency in how SageMaker client keeps INTER_TOKEN_LAT as a list
On the other hand, OpenAI client sums the latencies before returning
I think that if you modify the source code for sagemaker_client.py as follows, it will work correctly. metrics[common_metrics.INTER_TOKEN_LAT] = sum(time_to_next_token) Even if you make this change, INTER_TOKEN_LAT is divided by the number of output tokens in token_benchmark_ray.py, so the correct metrics should be calculated. |
when i am executing token_benchmark_ray.py we are getting below error
File "token_benchmark_ray.py", line 456, in
run_token_benchmark(
File "token_benchmark_ray.py", line 297, in run_token_benchmark
summary, individual_responses = get_token_throughput_latencies(
File "token_benchmark_ray.py", line 111, in get_token_throughput_latencies
request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens
TypeError: unsupported operand type(s) for /=: 'list' and 'int'
(SageMakerClient pid=15473) Warning Or Error: 'SageMakerRuntime' object has no attribute 'invoke_endpoint_with_response_stream'
(SageMakerClient pid=15473) None
The text was updated successfully, but these errors were encountered: