-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add mechanisms for rate-limiting and re-prompting in case 429 errors happen #3
Changes from 9 commits
cc4d48a
eef7549
e4b5e6f
99106a6
6004c8d
c19dc12
14d6c16
e04c8cf
c8fc854
4b4bfa7
bb56a25
401acd8
efefbb9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
https://github.com/manuelmhtr/countries-and-timezones/tree/e34cb4b6832795cbac8d44f6f9c97eb1038b831b | ||
https://github.com/infusion/Complex.js/tree/d995ca105e8adef4c38d0ace50643daf84e0dd1c | ||
https://gitlab.com/autokent/crawler-url-parser/tree/202c5b25ad693d284804261e2b3815fe66e0723e | ||
https://gitlab.com/demsking/image-downloader/tree/19a53f652824bd0c612cc5bcd3a2eb173a16f938 | ||
https://github.com/rainder/node-geo-point/tree/c839d477ff7a48d1fc6574495cbbc6196161f494 | ||
https://github.com/jprichardson/node-jsonfile/tree/9c6478a85899a9318547a6e9514b0403166d8c5c | ||
https://github.com/chakrit/node-uneval/tree/7578dc67090f650a171610a08ea529eba9d27438 | ||
https://github.com/swang/plural/tree/f0027d66ecb37ce0108c8bcb4a6a448d1bf64047 | ||
https://github.com/pull-stream/pull-stream/tree/29b4868bb3864c427c3988855c5d65ad5cb2cb1c | ||
https://gitlab.com/cptpackrat/spacl-core/tree/fcb8511a0d01bdc206582cfacb3e2b01a0288f6a | ||
https://github.com/maugenst/zip-a-folder/tree/5089113647753d5086ea20f052f9d29840866ee1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -121,6 +121,12 @@ if (require.main === module) { | |
default: 20, | ||
description: "maximum length of each snippet in lines", | ||
}, | ||
maxTokens: { | ||
type: "number", | ||
default: 1000, | ||
demandOption: false, | ||
description: "maximum number of tokens in a completion", | ||
}, | ||
temperatures: { | ||
type: "string", | ||
default: "0.0", | ||
|
@@ -156,6 +162,25 @@ if (require.main === module) { | |
default: "./templates/retry-template.hb", | ||
description: "Handlebars template file to use", | ||
}, | ||
nrAttempts: { | ||
type: "number", | ||
default: 3, | ||
description: "number of attempts to make for each request", | ||
}, | ||
rateLimit: { | ||
type: "number", | ||
default: 0, | ||
demandOption: false, | ||
description: | ||
"number of milliseconds between requests to the model (0 is no rate limit)", | ||
}, | ||
benchmark: { | ||
type: "boolean", | ||
default: false, | ||
demandOption: false, | ||
description: | ||
"use custom rate-limiting for benchmarking (if specified, this supercedes the rateLimit option)", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is there the need for superceding here? Isn't it the case that if you specify benchmark as True, then you must have a non-zero value for rateLimit? Otherwise, it's as if you don't have rate-limiting? Do you really need two options here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The confusion is caused by the fact that I implemented two ways of doing rate-limiting. First, it's possible to enforce that a specified number of milliseconds elapses between requests. Second, in "benchmark" rate-limiting mode, the interval is gradually reduced. This is because when we run an experiment, initially there are 25 concurrent workers that are each sending requests to the same API endpoint. As time goes on, some of these workers complete, so that there is less concurrency, and it's possible to speed up the remaining workers without exceeding the overall rate limit. These thresholds were determined experimentally based on the particular LLM service I'm using. I'll simplify things by turning this into a single command-line parameter called "rateLimit" which can either be a numeric value or the string "benchmark". |
||
}, | ||
}); | ||
const argv = await parser.argv; | ||
|
||
|
@@ -166,7 +191,13 @@ if (require.main === module) { | |
"Warning: --strictResponses has no effect when not using --responses" | ||
); | ||
} | ||
model = new ChatModel(argv.model); | ||
model = new ChatModel( | ||
argv.model, | ||
argv.nrAttempts, | ||
argv.rateLimit, | ||
argv.benchmark, | ||
{ max_tokens: argv.maxTokens } | ||
); | ||
} else { | ||
model = MockCompletionModel.fromFile( | ||
argv.responses, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,12 @@ | ||
import axios from "axios"; | ||
import { performance } from "perf_hooks"; | ||
import { ICompletionModel } from "./completionModel"; | ||
import { | ||
retry, | ||
RateLimiter, | ||
BenchmarkRateLimiter, | ||
FixedRateLimiter, | ||
} from "./promise-utils"; | ||
|
||
const defaultPostOptions = { | ||
max_tokens: 1000, // maximum number of tokens to return | ||
|
@@ -24,14 +30,33 @@ function getEnv(name: string): string { | |
export class ChatModel implements ICompletionModel { | ||
private readonly apiEndpoint: string; | ||
private readonly authHeaders: string; | ||
protected rateLimiter: RateLimiter | undefined; | ||
|
||
constructor( | ||
private readonly model: string, | ||
private readonly nrAttempts: number, | ||
private readonly rateLimit: number, | ||
private readonly benchmark: boolean, | ||
private readonly instanceOptions: PostOptions = {} | ||
) { | ||
this.apiEndpoint = getEnv("TESTPILOT_LLM_API_ENDPOINT"); | ||
this.authHeaders = getEnv("TESTPILOT_LLM_AUTH_HEADERS"); | ||
console.log(`Using Chat Model API at ${this.apiEndpoint}`); | ||
if (this.benchmark) { | ||
this.rateLimiter = new BenchmarkRateLimiter(); | ||
console.log( | ||
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and benchmark rate limit.` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok I was really confused at this point then I scrolled down to the BenchmarkRateLimiter and realized that basically you have two strategies:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. But the second option involves gradually decreasing the interval between successive requests |
||
); | ||
} else if (this.rateLimit > 0) { | ||
this.rateLimiter = new FixedRateLimiter(this.rateLimit); | ||
console.log( | ||
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and fixed rate of ${this.rateLimit} ms.` | ||
); | ||
} else { | ||
this.rateLimiter = undefined; | ||
console.log( | ||
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and no rate limit.` | ||
); | ||
} | ||
} | ||
|
||
/** | ||
|
@@ -62,11 +87,8 @@ export class ChatModel implements ICompletionModel { | |
|
||
const postOptions = { | ||
model: this.model, | ||
system: "You are a programming assistant.", | ||
messages: [ | ||
{ | ||
role: "system", | ||
content: "You are a programming assistant.", | ||
}, | ||
{ | ||
role: "user", | ||
content: prompt, | ||
|
@@ -75,7 +97,21 @@ export class ChatModel implements ICompletionModel { | |
...options, | ||
}; | ||
|
||
const res = await axios.post(this.apiEndpoint, postOptions, { headers }); | ||
let res; | ||
if (this.rateLimiter) { | ||
res = await retry( | ||
() => | ||
this.rateLimiter!.next(() => | ||
axios.post(this.apiEndpoint, postOptions, { headers }) | ||
), | ||
this.nrAttempts | ||
); | ||
} else { | ||
res = await retry( | ||
() => axios.post(this.apiEndpoint, postOptions, { headers }), | ||
this.nrAttempts | ||
); | ||
} | ||
|
||
performance.measure( | ||
`llm-query:${JSON.stringify({ | ||
|
@@ -99,16 +135,15 @@ export class ChatModel implements ICompletionModel { | |
} | ||
|
||
const completions = new Set<string>(); | ||
for (const choice of json.choices) { | ||
const content = choice.message.content; | ||
for (const choice of json.content) { | ||
const content = choice.text; | ||
completions.add(content); | ||
} | ||
return completions; | ||
} | ||
|
||
/** | ||
* Get completions from the LLM, extract the code fragments enclosed in a fenced code block, | ||
* and postprocess them as needed; print a warning if it did not produce any | ||
* Get completions from the LLM; issue a warning if it did not produce any | ||
* | ||
* @param prompt the prompt to use | ||
*/ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
/** | ||
* This function provides supports for retrying the creation of a promise | ||
* up to a given number of times in case the promise is rejected. | ||
* This is useful for, e.g., retrying a request to a server that is temporarily unavailable. | ||
* | ||
*/ | ||
export async function retry<T>( | ||
f: () => Promise<T>, | ||
howManyTimes: number | ||
): Promise<T> { | ||
let i = 1; | ||
let promise: Promise<T> = f(); // create the promise, but don't wait for its fulfillment yet.. | ||
while (i <= howManyTimes) { | ||
try { | ||
if (i > 1) { | ||
console.log(` retry ${i}/${howManyTimes}`); | ||
} | ||
let val: T = await promise; // throws an exception if the promise is rejected | ||
return val; // if the promise was fulfilled, return another promise that is fulfilled with the same value | ||
} catch (e) { | ||
i++; | ||
console.log(`Promise rejected with ${e}.`); | ||
promise = f(); // next attempt: create the promise, but don't wait for its fulfillment yet.. | ||
} | ||
} | ||
return promise; // if the promise was rejected howManyTimes times, return the last promise | ||
} | ||
|
||
/** | ||
* This class provides supports for asynchronous rate limiting by | ||
* limiting the number of requests to the server to at most one | ||
* in N milliseconds. This is useful for throttling requests to | ||
* a server that has a limit on the number of requests per second. | ||
*/ | ||
export abstract class RateLimiter { | ||
constructor(protected howManyMilliSeconds: number) { | ||
this.timer = this.resetTimer(); | ||
} | ||
/** | ||
* the timer is a promise that is resolved after a certain number of milliseconds | ||
* have elapsed. The timer is reset after each request. | ||
*/ | ||
private timer: Promise<void>; | ||
|
||
/** | ||
* Waits until the timer has expired, then evaluate the function that | ||
* produces the promise | ||
* @param p a function that produces a promise | ||
* @returns returns the promise produced by the function p (after the timer has expired) | ||
*/ | ||
public async next<T>(p: () => Promise<T>): Promise<T> { | ||
await this.timer; // wait until timer has expired | ||
this.timer = this.resetTimer(); // reset timer (for the next request) | ||
return p(); // return the promise | ||
} | ||
|
||
/** | ||
* resets the timer | ||
* @returns a promise that is resolved after the number of milliseconds | ||
* specified in the constructor have elapsed | ||
*/ | ||
protected resetTimer = () => | ||
new Promise<void>((resolve, reject) => { | ||
setTimeout(() => { | ||
resolve(); | ||
}, this.howManyMilliSeconds); | ||
}); | ||
} | ||
|
||
/** | ||
* A rate limiter that limits the number of requests to the server to a | ||
* maximum of one per N milliseconds. | ||
* | ||
*/ | ||
export class FixedRateLimiter extends RateLimiter { | ||
public constructor(N: number) { | ||
super(N); | ||
} | ||
} | ||
|
||
/** | ||
* A custom rate limiter for use during benchmark runs. It increases | ||
* the pace of requests after two designated thresholds have been reached. | ||
*/ | ||
export class BenchmarkRateLimiter extends RateLimiter { | ||
private requestCount: number; | ||
|
||
private static INITIAL_PACE = 10000; | ||
private static PACE_AFTER_150_REQUESTS = 5000; | ||
private static PACE_AFTER_300_REQUESTS = 2500; | ||
|
||
constructor() { | ||
console.log( | ||
`BenchmarkRateLimiter: initial pace is ${BenchmarkRateLimiter.INITIAL_PACE}` | ||
); | ||
super(BenchmarkRateLimiter.INITIAL_PACE); | ||
this.requestCount = 0; | ||
} | ||
|
||
public next<T>(p: () => Promise<T>): Promise<T> { | ||
this.requestCount++; | ||
if (this.requestCount === 150) { | ||
this.howManyMilliSeconds = BenchmarkRateLimiter.PACE_AFTER_150_REQUESTS; | ||
console.log( | ||
`BenchmarkRateLimiter: increasing pace to ${BenchmarkRateLimiter.PACE_AFTER_150_REQUESTS}` | ||
); | ||
} else if (this.requestCount === 300) { | ||
this.howManyMilliSeconds = BenchmarkRateLimiter.PACE_AFTER_300_REQUESTS; | ||
console.log( | ||
`BenchmarkRateLimiter: increasing pace to ${BenchmarkRateLimiter.PACE_AFTER_300_REQUESTS}` | ||
); | ||
} | ||
return super.next(p); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the name
benchmarkMode
is very intuitive to me for its purpose. So if I'm running in benchmarkMode, it means I use custom rate limiting but if I'm not running in benchmarkMode, I don't use any rate limiting? Wouldn't making this namedrateLimiting
make more sense? Especially since the command-line argument below is also namedbenchmark
, which at first glance, I thought means which benchmark (i.e., projects) are you running on?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below.