Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mechanisms for rate-limiting and re-prompting in case 429 errors happen #3

Merged
merged 13 commits into from
Jul 2, 2024
11 changes: 11 additions & 0 deletions .github/benchmarks11.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
https://github.com/manuelmhtr/countries-and-timezones/tree/e34cb4b6832795cbac8d44f6f9c97eb1038b831b
https://github.com/infusion/Complex.js/tree/d995ca105e8adef4c38d0ace50643daf84e0dd1c
https://gitlab.com/autokent/crawler-url-parser/tree/202c5b25ad693d284804261e2b3815fe66e0723e
https://gitlab.com/demsking/image-downloader/tree/19a53f652824bd0c612cc5bcd3a2eb173a16f938
https://github.com/rainder/node-geo-point/tree/c839d477ff7a48d1fc6574495cbbc6196161f494
https://github.com/jprichardson/node-jsonfile/tree/9c6478a85899a9318547a6e9514b0403166d8c5c
https://github.com/chakrit/node-uneval/tree/7578dc67090f650a171610a08ea529eba9d27438
https://github.com/swang/plural/tree/f0027d66ecb37ce0108c8bcb4a6a448d1bf64047
https://github.com/pull-stream/pull-stream/tree/29b4868bb3864c427c3988855c5d65ad5cb2cb1c
https://gitlab.com/cptpackrat/spacl-core/tree/fcb8511a0d01bdc206582cfacb3e2b01a0288f6a
https://github.com/maugenst/zip-a-folder/tree/5089113647753d5086ea20f052f9d29840866ee1
7 changes: 6 additions & 1 deletion .github/workflows/run-experiment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ on:
description: "Skip slow benchmarks"
type: boolean
default: false
benchmarkMode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the name benchmarkMode is very intuitive to me for its purpose. So if I'm running in benchmarkMode, it means I use custom rate limiting but if I'm not running in benchmarkMode, I don't use any rate limiting? Wouldn't making this named rateLimiting make more sense? Especially since the command-line argument below is also named benchmark, which at first glance, I thought means which benchmark (i.e., projects) are you running on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below.

type: boolean
description: "Use custom rate limiting for running benchmarks"
default: false
debug_enabled:
type: boolean
description: "Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)"
Expand Down Expand Up @@ -175,7 +179,8 @@ jobs:
--temperatures "${{ needs.setup.outputs.temperatures }}" \
--model ${{ needs.setup.outputs.model }} \
--template ${{ needs.setup.outputs.template }} \
--retryTemplate ${{ needs.setup.outputs.retryTemplate }}"
--retryTemplate ${{ needs.setup.outputs.retryTemplate }} \
--benchmark ${{ github.event.inputs.benchmarkMode }} "
echo "command: $command"
$command
mv stats.json $outputdir
Expand Down
33 changes: 32 additions & 1 deletion benchmark/run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,12 @@ if (require.main === module) {
default: 20,
description: "maximum length of each snippet in lines",
},
maxTokens: {
type: "number",
default: 1000,
demandOption: false,
description: "maximum number of tokens in a completion",
},
temperatures: {
type: "string",
default: "0.0",
Expand Down Expand Up @@ -156,6 +162,25 @@ if (require.main === module) {
default: "./templates/retry-template.hb",
description: "Handlebars template file to use",
},
nrAttempts: {
type: "number",
default: 3,
description: "number of attempts to make for each request",
},
rateLimit: {
type: "number",
default: 0,
demandOption: false,
description:
"number of milliseconds between requests to the model (0 is no rate limit)",
},
benchmark: {
type: "boolean",
default: false,
demandOption: false,
description:
"use custom rate-limiting for benchmarking (if specified, this supercedes the rateLimit option)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there the need for superceding here? Isn't it the case that if you specify benchmark as True, then you must have a non-zero value for rateLimit? Otherwise, it's as if you don't have rate-limiting? Do you really need two options here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusion is caused by the fact that I implemented two ways of doing rate-limiting. First, it's possible to enforce that a specified number of milliseconds elapses between requests. Second, in "benchmark" rate-limiting mode, the interval is gradually reduced. This is because when we run an experiment, initially there are 25 concurrent workers that are each sending requests to the same API endpoint. As time goes on, some of these workers complete, so that there is less concurrency, and it's possible to speed up the remaining workers without exceeding the overall rate limit. These thresholds were determined experimentally based on the particular LLM service I'm using.

I'll simplify things by turning this into a single command-line parameter called "rateLimit" which can either be a numeric value or the string "benchmark".

},
});
const argv = await parser.argv;

Expand All @@ -166,7 +191,13 @@ if (require.main === module) {
"Warning: --strictResponses has no effect when not using --responses"
);
}
model = new ChatModel(argv.model);
model = new ChatModel(
argv.model,
argv.nrAttempts,
argv.rateLimit,
argv.benchmark,
{ max_tokens: argv.maxTokens }
);
} else {
model = MockCompletionModel.fromFile(
argv.responses,
Expand Down
55 changes: 45 additions & 10 deletions src/chatmodel.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
import axios from "axios";
import { performance } from "perf_hooks";
import { ICompletionModel } from "./completionModel";
import {
retry,
RateLimiter,
BenchmarkRateLimiter,
FixedRateLimiter,
} from "./promise-utils";

const defaultPostOptions = {
max_tokens: 1000, // maximum number of tokens to return
Expand All @@ -24,14 +30,33 @@ function getEnv(name: string): string {
export class ChatModel implements ICompletionModel {
private readonly apiEndpoint: string;
private readonly authHeaders: string;
protected rateLimiter: RateLimiter | undefined;

constructor(
private readonly model: string,
private readonly nrAttempts: number,
private readonly rateLimit: number,
private readonly benchmark: boolean,
private readonly instanceOptions: PostOptions = {}
) {
this.apiEndpoint = getEnv("TESTPILOT_LLM_API_ENDPOINT");
this.authHeaders = getEnv("TESTPILOT_LLM_AUTH_HEADERS");
console.log(`Using Chat Model API at ${this.apiEndpoint}`);
if (this.benchmark) {
this.rateLimiter = new BenchmarkRateLimiter();
console.log(
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and benchmark rate limit.`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I was really confused at this point then I scrolled down to the BenchmarkRateLimiter and realized that basically you have two strategies:

  • specify a rate limit during the initial configuration
  • let the benchmark rate limiter do its thing where it increasingly tries different rate limits.
    I now get it but I don't think the naming/initial descriptions in the configurations are very clear tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. But the second option involves gradually decreasing the interval between successive requests

);
} else if (this.rateLimit > 0) {
this.rateLimiter = new FixedRateLimiter(this.rateLimit);
console.log(
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and fixed rate of ${this.rateLimit} ms.`
);
} else {
this.rateLimiter = undefined;
console.log(
`Using ${this.model} at ${this.apiEndpoint} with ${this.nrAttempts} attempts and no rate limit.`
);
}
}

/**
Expand Down Expand Up @@ -62,11 +87,8 @@ export class ChatModel implements ICompletionModel {

const postOptions = {
model: this.model,
system: "You are a programming assistant.",
messages: [
{
role: "system",
content: "You are a programming assistant.",
},
{
role: "user",
content: prompt,
Expand All @@ -75,7 +97,21 @@ export class ChatModel implements ICompletionModel {
...options,
};

const res = await axios.post(this.apiEndpoint, postOptions, { headers });
let res;
if (this.rateLimiter) {
res = await retry(
() =>
this.rateLimiter!.next(() =>
axios.post(this.apiEndpoint, postOptions, { headers })
),
this.nrAttempts
);
} else {
res = await retry(
() => axios.post(this.apiEndpoint, postOptions, { headers }),
this.nrAttempts
);
}

performance.measure(
`llm-query:${JSON.stringify({
Expand All @@ -99,16 +135,15 @@ export class ChatModel implements ICompletionModel {
}

const completions = new Set<string>();
for (const choice of json.choices) {
const content = choice.message.content;
for (const choice of json.content) {
const content = choice.text;
completions.add(content);
}
return completions;
}

/**
* Get completions from the LLM, extract the code fragments enclosed in a fenced code block,
* and postprocess them as needed; print a warning if it did not produce any
* Get completions from the LLM; issue a warning if it did not produce any
*
* @param prompt the prompt to use
*/
Expand Down
115 changes: 115 additions & 0 deletions src/promise-utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/**
* This function provides supports for retrying the creation of a promise
* up to a given number of times in case the promise is rejected.
* This is useful for, e.g., retrying a request to a server that is temporarily unavailable.
*
*/
export async function retry<T>(
f: () => Promise<T>,
howManyTimes: number
): Promise<T> {
let i = 1;
let promise: Promise<T> = f(); // create the promise, but don't wait for its fulfillment yet..
while (i <= howManyTimes) {
try {
if (i > 1) {
console.log(` retry ${i}/${howManyTimes}`);
}
let val: T = await promise; // throws an exception if the promise is rejected
return val; // if the promise was fulfilled, return another promise that is fulfilled with the same value
} catch (e) {
i++;
console.log(`Promise rejected with ${e}.`);
promise = f(); // next attempt: create the promise, but don't wait for its fulfillment yet..
}
}
return promise; // if the promise was rejected howManyTimes times, return the last promise
}

/**
* This class provides supports for asynchronous rate limiting by
* limiting the number of requests to the server to at most one
* in N milliseconds. This is useful for throttling requests to
* a server that has a limit on the number of requests per second.
*/
export abstract class RateLimiter {
constructor(protected howManyMilliSeconds: number) {
this.timer = this.resetTimer();
}
/**
* the timer is a promise that is resolved after a certain number of milliseconds
* have elapsed. The timer is reset after each request.
*/
private timer: Promise<void>;

/**
* Waits until the timer has expired, then evaluate the function that
* produces the promise
* @param p a function that produces a promise
* @returns returns the promise produced by the function p (after the timer has expired)
*/
public async next<T>(p: () => Promise<T>): Promise<T> {
await this.timer; // wait until timer has expired
this.timer = this.resetTimer(); // reset timer (for the next request)
return p(); // return the promise
}

/**
* resets the timer
* @returns a promise that is resolved after the number of milliseconds
* specified in the constructor have elapsed
*/
protected resetTimer = () =>
new Promise<void>((resolve, reject) => {
setTimeout(() => {
resolve();
}, this.howManyMilliSeconds);
});
}

/**
* A rate limiter that limits the number of requests to the server to a
* maximum of one per N milliseconds.
*
*/
export class FixedRateLimiter extends RateLimiter {
public constructor(N: number) {
super(N);
}
}

/**
* A custom rate limiter for use during benchmark runs. It increases
* the pace of requests after two designated thresholds have been reached.
*/
export class BenchmarkRateLimiter extends RateLimiter {
private requestCount: number;

private static INITIAL_PACE = 10000;
private static PACE_AFTER_150_REQUESTS = 5000;
private static PACE_AFTER_300_REQUESTS = 2500;

constructor() {
console.log(
`BenchmarkRateLimiter: initial pace is ${BenchmarkRateLimiter.INITIAL_PACE}`
);
super(BenchmarkRateLimiter.INITIAL_PACE);
this.requestCount = 0;
}

public next<T>(p: () => Promise<T>): Promise<T> {
this.requestCount++;
if (this.requestCount === 150) {
this.howManyMilliSeconds = BenchmarkRateLimiter.PACE_AFTER_150_REQUESTS;
console.log(
`BenchmarkRateLimiter: increasing pace to ${BenchmarkRateLimiter.PACE_AFTER_150_REQUESTS}`
);
} else if (this.requestCount === 300) {
this.howManyMilliSeconds = BenchmarkRateLimiter.PACE_AFTER_300_REQUESTS;
console.log(
`BenchmarkRateLimiter: increasing pace to ${BenchmarkRateLimiter.PACE_AFTER_300_REQUESTS}`
);
}
return super.next(p);
}
}
Loading