|
2 | 2 |
|
3 | 3 | [TOPLOC](https://arxiv.org/abs/2501.16007) is a novel method for verifiable inference that enables users to verify that LLM providers are using the correct model configurations and settings. It leverages locality sensitive hashing for intermediate activations to detect unauthorized modifications.
|
4 | 4 |
|
5 |
| -## Features |
| 5 | +For code used in our experiments, check out: https://github.com/PrimeIntellect-ai/toploc-experiments |
| 6 | + |
| 7 | +### Installation |
| 8 | + |
| 9 | +```bash |
| 10 | +pip install -U toploc |
| 11 | +``` |
| 12 | + |
| 13 | +### Features |
6 | 14 |
|
7 | 15 | - Detect unauthorized modifications to models, prompts, and precision settings
|
8 | 16 | - 1000x reduction in storage requirements compared to full activation storage
|
9 | 17 | - Validation speeds up to 100x faster than original inference
|
10 | 18 | - Robust across different hardware configurations and implementations
|
11 | 19 | - Zero false positives/negatives in empirical testing
|
12 | 20 |
|
13 |
| -## Key Components |
14 |
| - |
15 |
| -### Proof Generation |
16 |
| -- Extracts top-k values from the last hidden state |
17 |
| -- Uses polynomial encoding for compact storage |
18 |
| -- Generates verifiable proof during inference |
19 |
| - |
20 |
| -### Validation |
21 |
| -- Recalculates top-k features |
22 |
| -- Compares exponent and mantissa differences |
23 |
| -- Validates against predefined error thresholds |
24 |
| - |
25 |
| -### Storage Requirements |
26 |
| -- Only 258 bytes per 32 new tokens |
27 |
| -- Compared to 262KB for full token embeddings (Llama-3.1-8B-Instruct) |
28 |
| - |
29 |
| -## Integrations |
30 |
| - |
31 |
| -### vLLM |
32 |
| -TOPLOC is integrated with vLLM for efficient inference and validation as part of this repository. The integration allows you to leverage vLLM's optimized inference pipeline while maintaining verification capabilities. |
33 |
| - |
34 |
| -### SGLang |
35 |
| -We maintain a [fork of SGLang](https://github.com/PrimeIntellect-ai/sglang) that includes TOPLOC integration, enabling verifiable inference with SGLang's framework. |
36 |
| - |
37 |
| -## How to use the code |
38 |
| - |
39 |
| -### Installation |
| 21 | +### Development Setup |
40 | 22 |
|
| 23 | +1. Install uv: |
41 | 24 | ```bash
|
42 |
| -git clone https://github.com/PrimeIntellect/toploc.git |
43 |
| -pip install -r requirements.txt |
| 25 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 26 | +source $HOME/.local/bin/env |
44 | 27 | ```
|
45 | 28 |
|
46 |
| -### Run Experiments |
47 |
| - |
48 |
| -This is an example on running validation with Llama-3.1-8B-Instruct over the ultrachat dataset. |
49 |
| - |
50 |
| -First, generate the polynomial encodings for the model using: |
51 |
| - |
| 29 | +2. Setup virtual environment: |
| 30 | +This can take awhile because of the C extensions. |
52 | 31 | ```bash
|
53 |
| -python vllm_generate_poly.py --model_name meta-llama/Llama-3.1-8B-Instruct --tp 1 --n_samples 4 --save_dir signatures --max_decode_tokens 512 --dataset_name stingning/ultrachat --dtype bfloat16 |
| 32 | +uv venv --python 3.12 |
| 33 | +source .venv/bin/activate |
| 34 | +uv sync --dev |
54 | 35 | ```
|
55 | 36 |
|
56 |
| -This should create a directory called `signatures` with the polynomial encodings for the model. |
57 |
| - |
58 |
| -You can then run validation with: |
| 37 | +### Running Tests |
59 | 38 |
|
| 39 | +Run the test suite: |
60 | 40 | ```bash
|
61 |
| -python vllm_validate_poly.py --decode_model_name meta-llama/Llama-3.1-8B-Instruct --validate_model_name meta-llama/Llama-3.1-8B-Instruct --tp 1 --n_samples 4 --save_dir just4 --max_decode_tokens 512 --dataset_name stingning/ultrachat --dtype bfloat16 --attn flash |
| 41 | +uv run pytest tests |
62 | 42 | ```
|
63 | 43 |
|
64 |
| -If the verification passes, you should see: |
65 |
| - |
| 44 | +Run single file: |
| 45 | +```bash |
| 46 | +uv run pytest tests/test_utils.py |
66 | 47 | ```
|
67 |
| -VERIFICATION PASSED: Mantissa error mean: 0. below 10 and median: 0. below 8 and exp intersections: 100 below 90 |
| 48 | + |
| 49 | +Run single test: |
| 50 | +```bash |
| 51 | +uv run pytest tests/test_utils.py::test_get_fp32_parts |
68 | 52 | ```
|
69 | 53 |
|
70 |
| -And if it fails, you should see something like: |
| 54 | +### Code Quality |
71 | 55 |
|
| 56 | +Install pre-commit hooks: |
| 57 | +```bash |
| 58 | +uv run pre-commit install |
72 | 59 | ```
|
73 |
| -VERIFICATION FAILED: Mantissa error mean: 11.000000 above 10 or median: 10.000000 above 8 or exp intersections: 0 above 90 |
| 60 | + |
| 61 | +Run linting and formatting on all files: |
| 62 | +```bash |
| 63 | +pre-commit run --all-files |
74 | 64 | ```
|
75 | 65 |
|
76 |
| -## Citing |
| 66 | +# Citing |
77 | 67 |
|
78 | 68 | ```bibtex
|
79 | 69 | @misc{ong2025toploclocalitysensitivehashing,
|
|
0 commit comments