|
1 |
| -# CodeGen Benchmarking |
| 1 | +# CodeGen Performance Benchmark |
2 | 2 |
|
3 |
| -This folder contains a collection of scripts to enable inference benchmarking by leveraging a comprehensive benchmarking tool, [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md), that enables throughput analysis to assess inference performance. |
| 3 | +## Table of Contents |
4 | 4 |
|
5 |
| -By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community. |
| 5 | +- [Purpose](#purpose) |
| 6 | +- [Benchmarking Tool](#benchmarking-tool) |
| 7 | +- [Metrics Measured](#metrics-measured) |
| 8 | +- [Prerequisites](#prerequisites) |
| 9 | +- [Running the Performance Benchmark](#running-the-performance-benchmark) |
| 10 | +- [Data Collection](#data-collection) |
6 | 11 |
|
7 | 12 | ## Purpose
|
8 | 13 |
|
9 |
| -We aim to run these benchmarks and share them with the OPEA community for three primary reasons: |
| 14 | +This guide describes how to benchmark the inference performance (throughput and latency) of a deployed CodeGen service. The results help understand the service's capacity under load and compare different deployment configurations or models. This benchmark primarily targets Kubernetes deployments but can be adapted for Docker. |
10 | 15 |
|
11 |
| -- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs. |
12 |
| -- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case. |
13 |
| -- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc. |
| 16 | +## Benchmarking Tool |
14 | 17 |
|
15 |
| -## Metrics |
| 18 | +We use the [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) tool for performance benchmarking, which simulates concurrent users sending requests to the service endpoint. |
16 | 19 |
|
17 |
| -The benchmark will report the below metrics, including: |
| 20 | +## Metrics Measured |
18 | 21 |
|
19 |
| -- Number of Concurrent Requests |
20 |
| -- End-to-End Latency: P50, P90, P99 (in milliseconds) |
21 |
| -- End-to-End First Token Latency: P50, P90, P99 (in milliseconds) |
22 |
| -- Average Next Token Latency (in milliseconds) |
23 |
| -- Average Token Latency (in milliseconds) |
24 |
| -- Requests Per Second (RPS) |
25 |
| -- Output Tokens Per Second |
26 |
| -- Input Tokens Per Second |
| 22 | +The benchmark reports several key performance indicators: |
27 | 23 |
|
28 |
| -Results will be displayed in the terminal and saved as CSV file named `1_testspec.yaml`. |
| 24 | +- **Concurrency:** Number of concurrent requests simulated. |
| 25 | +- **End-to-End Latency:** Time from request submission to final response received (P50, P90, P99 in ms). |
| 26 | +- **End-to-End First Token Latency:** Time from request submission to first token received (P50, P90, P99 in ms). |
| 27 | +- **Average Next Token Latency:** Average time between subsequent generated tokens (in ms). |
| 28 | +- **Average Token Latency:** Average time per generated token (in ms). |
| 29 | +- **Requests Per Second (RPS):** Throughput of the service. |
| 30 | +- **Output Tokens Per Second:** Rate of token generation. |
| 31 | +- **Input Tokens Per Second:** Rate of token consumption. |
29 | 32 |
|
30 |
| -## Getting Started |
| 33 | +## Prerequisites |
31 | 34 |
|
32 |
| -We recommend using Kubernetes to deploy the CodeGen service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs. |
| 35 | +- A running CodeGen service accessible via an HTTP endpoint. Refer to the main [CodeGen README](../../README.md) for deployment options (Kubernetes recommended for load balancing/scalability). |
| 36 | +- **If using Kubernetes:** |
| 37 | + - A working Kubernetes cluster (refer to OPEA K8s setup guides if needed). |
| 38 | + - `kubectl` configured to access the cluster from the node where the benchmark will run (typically the master node). |
| 39 | + - Ensure sufficient `ulimit` for network connections on worker nodes hosting the service pods (e.g., `LimitNOFILE=65536` or higher in containerd/docker config). |
| 40 | +- **General:** |
| 41 | + - Python 3.8+ on the node running the benchmark script. |
| 42 | + - Network access from the benchmark node to the CodeGen service endpoint. |
33 | 43 |
|
34 |
| -### Prerequisites |
| 44 | +## Running the Performance Benchmark |
35 | 45 |
|
36 |
| -- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md). |
| 46 | +1. **Deploy CodeGen Service:** Ensure your CodeGen service is deployed and accessible. Note the service endpoint URL (e.g., obtained via `kubectl get svc` or your ingress configuration if using Kubernetes, or `http://{host_ip}:{port}` for Docker). |
37 | 47 |
|
38 |
| -- Every node has direct internet access |
39 |
| -- Set up kubectl on the master node with access to the Kubernetes cluster. |
40 |
| -- Install Python 3.8+ on the master node for running GenAIEval. |
41 |
| -- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods. |
42 |
| -- Ensure that the container's ulimit can meet the the number of requests. |
| 48 | +2. **Configure Benchmark Parameters (Optional):** |
| 49 | + Set environment variables to customize the test queries and output directory. The `USER_QUERIES` variable defines the number of concurrent requests for each test run. |
43 | 50 |
|
44 |
| -```bash |
45 |
| -# The way to modify the containered ulimit: |
46 |
| -sudo systemctl edit containerd |
47 |
| -# Add two lines: |
48 |
| -[Service] |
49 |
| -LimitNOFILE=65536:1048576 |
| 51 | + ```bash |
| 52 | + # Example: Four runs with 128 concurrent requests each |
| 53 | + export USER_QUERIES="[128, 128, 128, 128]" |
| 54 | + # Example: Output directory |
| 55 | + export TEST_OUTPUT_DIR="/tmp/benchmark_output" |
| 56 | + # Set the target endpoint URL |
| 57 | + export CODEGEN_ENDPOINT_URL="http://{your_service_ip_or_hostname}:{port}/v1/codegen" |
| 58 | + ``` |
50 | 59 |
|
51 |
| -sudo systemctl daemon-reload; sudo systemctl restart containerd |
52 |
| -``` |
| 60 | + _Replace `{your_service_ip_or_hostname}:{port}` with the actual accessible URL of your CodeGen gateway service._ |
53 | 61 |
|
54 |
| -### Test Steps |
| 62 | +3. **Execute the Benchmark Script:** |
| 63 | + Run the script, optionally specifying the number of Kubernetes nodes involved if relevant for reporting context (the script itself runs from one node). |
| 64 | + ```bash |
| 65 | + # Clone GenAIExamples if you haven't already |
| 66 | + # cd GenAIExamples/CodeGen/benchmark/performance |
| 67 | + bash benchmark.sh # Add '-n <node_count>' if desired for logging purposes |
| 68 | + ``` |
| 69 | + _Ensure the `benchmark.sh` script is adapted to use `CODEGEN_ENDPOINT_URL` and potentially `USER_QUERIES`, `TEST_OUTPUT_DIR`._ |
55 | 70 |
|
56 |
| -Please deploy CodeGen service before benchmarking. |
| 71 | +## Data Collection |
57 | 72 |
|
58 |
| -#### Run Benchmark Test |
59 |
| - |
60 |
| -Before the benchmark, we can configure the number of test queries and test output directory by: |
61 |
| - |
62 |
| -```bash |
63 |
| -export USER_QUERIES="[128, 128, 128, 128]" |
64 |
| -export TEST_OUTPUT_DIR="/tmp/benchmark_output" |
65 |
| -``` |
66 |
| - |
67 |
| -And then run the benchmark by: |
68 |
| - |
69 |
| -```bash |
70 |
| -bash benchmark.sh -n <node_count> |
71 |
| -``` |
72 |
| - |
73 |
| -The argument `-n` refers to the number of test nodes. |
74 |
| - |
75 |
| -#### Data collection |
76 |
| - |
77 |
| -All the test results will come to this folder `/tmp/benchmark_output` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps. |
| 73 | +Benchmark results will be displayed in the terminal upon completion. Detailed results, typically including raw data and summary statistics, will be saved in the directory specified by `TEST_OUTPUT_DIR` (defaulting to `/tmp/benchmark_output`). CSV files (e.g., `1_testspec.yaml.csv`) containing metrics for each run are usually generated here. |
0 commit comments