-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add mechanisms for rate-limiting and re-prompting in case 429 errors happen #3
Conversation
.github/workflows/run-experiment.yml
Outdated
@@ -29,6 +29,10 @@ on: | |||
description: "Skip slow benchmarks" | |||
type: boolean | |||
default: false | |||
benchmarkMode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the name benchmarkMode
is very intuitive to me for its purpose. So if I'm running in benchmarkMode, it means I use custom rate limiting but if I'm not running in benchmarkMode, I don't use any rate limiting? Wouldn't making this named rateLimiting
make more sense? Especially since the command-line argument below is also named benchmark
, which at first glance, I thought means which benchmark (i.e., projects) are you running on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below.
benchmark/run.ts
Outdated
default: false, | ||
demandOption: false, | ||
description: | ||
"use custom rate-limiting for benchmarking (if specified, this supercedes the rateLimit option)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there the need for superceding here? Isn't it the case that if you specify benchmark as True, then you must have a non-zero value for rateLimit? Otherwise, it's as if you don't have rate-limiting? Do you really need two options here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confusion is caused by the fact that I implemented two ways of doing rate-limiting. First, it's possible to enforce that a specified number of milliseconds elapses between requests. Second, in "benchmark" rate-limiting mode, the interval is gradually reduced. This is because when we run an experiment, initially there are 25 concurrent workers that are each sending requests to the same API endpoint. As time goes on, some of these workers complete, so that there is less concurrency, and it's possible to speed up the remaining workers without exceeding the overall rate limit. These thresholds were determined experimentally based on the particular LLM service I'm using.
I'll simplify things by turning this into a single command-line parameter called "rateLimit" which can either be a numeric value or the string "benchmark".
src/chatmodel.ts
Outdated
if (this.benchmark) { | ||
this.rateLimiter = new BenchmarkRateLimiter(); | ||
console.log( | ||
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and benchmark rate limit.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I was really confused at this point then I scrolled down to the BenchmarkRateLimiter and realized that basically you have two strategies:
- specify a rate limit during the initial configuration
- let the benchmark rate limiter do its thing where it increasingly tries different rate limits.
I now get it but I don't think the naming/initial descriptions in the configurations are very clear tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. But the second option involves gradually decreasing the interval between successive requests
This PR contains mechanisms for dealing with API endpoints that are rate-limited, causing requests to fail with 429 errors. To work around this, we implemented: