Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mechanisms for rate-limiting and re-prompting in case 429 errors happen #3

Merged
merged 13 commits into from
Jul 2, 2024

Conversation

franktip
Copy link
Contributor

@franktip franktip commented Jul 1, 2024

This PR contains mechanisms for dealing with API endpoints that are rate-limited, causing requests to fail with 429 errors. To work around this, we implemented:

  • a mechanism for ensuring that a specified number of milliseconds elapses between prompts
  • a mechanism for retrying the same prompt in case a 429 error happens

@@ -29,6 +29,10 @@ on:
description: "Skip slow benchmarks"
type: boolean
default: false
benchmarkMode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the name benchmarkMode is very intuitive to me for its purpose. So if I'm running in benchmarkMode, it means I use custom rate limiting but if I'm not running in benchmarkMode, I don't use any rate limiting? Wouldn't making this named rateLimiting make more sense? Especially since the command-line argument below is also named benchmark, which at first glance, I thought means which benchmark (i.e., projects) are you running on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below.

benchmark/run.ts Outdated
default: false,
demandOption: false,
description:
"use custom rate-limiting for benchmarking (if specified, this supercedes the rateLimit option)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there the need for superceding here? Isn't it the case that if you specify benchmark as True, then you must have a non-zero value for rateLimit? Otherwise, it's as if you don't have rate-limiting? Do you really need two options here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusion is caused by the fact that I implemented two ways of doing rate-limiting. First, it's possible to enforce that a specified number of milliseconds elapses between requests. Second, in "benchmark" rate-limiting mode, the interval is gradually reduced. This is because when we run an experiment, initially there are 25 concurrent workers that are each sending requests to the same API endpoint. As time goes on, some of these workers complete, so that there is less concurrency, and it's possible to speed up the remaining workers without exceeding the overall rate limit. These thresholds were determined experimentally based on the particular LLM service I'm using.

I'll simplify things by turning this into a single command-line parameter called "rateLimit" which can either be a numeric value or the string "benchmark".

src/chatmodel.ts Outdated
if (this.benchmark) {
this.rateLimiter = new BenchmarkRateLimiter();
console.log(
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and benchmark rate limit.`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I was really confused at this point then I scrolled down to the BenchmarkRateLimiter and realized that basically you have two strategies:

  • specify a rate limit during the initial configuration
  • let the benchmark rate limiter do its thing where it increasingly tries different rate limits.
    I now get it but I don't think the naming/initial descriptions in the configurations are very clear tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. But the second option involves gradually decreasing the interval between successive requests

@franktip franktip merged commit 88eb3b0 into main Jul 2, 2024
5 checks passed
@franktip franktip deleted the retry-429 branch July 2, 2024 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants