Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sagemaker client issue #53

Open
SuchethaChintha opened this issue Jun 7, 2024 · 3 comments
Open

Sagemaker client issue #53

SuchethaChintha opened this issue Jun 7, 2024 · 3 comments

Comments

@SuchethaChintha
Copy link

when i am executing token_benchmark_ray.py we are getting below error
File "token_benchmark_ray.py", line 456, in
run_token_benchmark(
File "token_benchmark_ray.py", line 297, in run_token_benchmark
summary, individual_responses = get_token_throughput_latencies(
File "token_benchmark_ray.py", line 111, in get_token_throughput_latencies
request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens
TypeError: unsupported operand type(s) for /=: 'list' and 'int'
(SageMakerClient pid=15473) Warning Or Error: 'SageMakerRuntime' object has no attribute 'invoke_endpoint_with_response_stream'
(SageMakerClient pid=15473) None

@SuchethaChintha SuchethaChintha changed the title Sagemaker client isssue Sagemaker client issue Jun 7, 2024
@Tatiats7
Copy link

Hey @SuchethaChintha did you fix that ?

@vjaramillo
Copy link

This is probably due to using an older version of the Sagemaker SDK. Updating it should fix the issue.

@ryoshirahama
Copy link

It seems that this error occurs because there's an inconsistency in how INTER_TOKEN_LAT is handled between different LLM clients.

SageMaker client keeps INTER_TOKEN_LAT as a list

metrics[common_metrics.INTER_TOKEN_LAT] = time_to_next_token

On the other hand, OpenAI client sums the latencies before returning

metrics[common_metrics.INTER_TOKEN_LAT] = sum(time_to_next_token) #This should be same as metrics[common_metrics.E2E_LAT]. Leave it here for now

I think that if you modify the source code for sagemaker_client.py as follows, it will work correctly.

metrics[common_metrics.INTER_TOKEN_LAT] = sum(time_to_next_token)

Even if you make this change, INTER_TOKEN_LAT is divided by the number of output tokens in token_benchmark_ray.py, so the correct metrics should be calculated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants