Skip to content

Commit e768e54

Browse files
authored
fix: support max kv cache length. (#43)
1 parent b3665f2 commit e768e54

File tree

3 files changed

+231
-12
lines changed

3 files changed

+231
-12
lines changed

examples/macbeth.sh

+210
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
#!/bin/bash
2+
3+
# This is a simple test of generating a sequence that fulfills the KV cache.
4+
#
5+
# Model: https://huggingface.co/b4rtaz/llama-3-8b-distributed-llama
6+
# Probably, this test will be working correctly only on MacBook Pro, due to differences in float multiplication on different CPUs.
7+
8+
cd "$(dirname "$0")"
9+
cd ..
10+
11+
# Source: https://www.opensourceshakespeare.org/views/plays/play_view.php?WorkID=macbeth&Scope=entire
12+
PROMPT="Duncan. What bloody man is that? He can report,
13+
As seemeth by his plight, of the revolt
14+
The newest state. 20
15+
16+
Malcolm. This is the sergeant
17+
Who like a good and hardy soldier fought
18+
'Gainst my captivity. Hail, brave friend!
19+
Say to the king the knowledge of the broil
20+
As thou didst leave it. 25
21+
22+
Sergeant. Doubtful it stood;
23+
As two spent swimmers, that do cling together
24+
And choke their art. The merciless Macdonwald—
25+
Worthy to be a rebel, for to that
26+
The multiplying villanies of nature 30
27+
Do swarm upon him—from the western isles
28+
Of kerns and gallowglasses is supplied;
29+
And fortune, on his damned quarrel smiling,
30+
Show'd like a rebel's whore: but all's too weak:
31+
For brave Macbeth—well he deserves that name— 35
32+
Disdaining fortune, with his brandish'd steel,
33+
Which smoked with bloody execution,
34+
Like valour's minion carved out his passage
35+
Till he faced the slave;
36+
Which ne'er shook hands, nor bade farewell to him, 40
37+
Till he unseam'd him from the nave to the chaps,
38+
And fix'd his head upon our battlements.
39+
40+
Duncan. O valiant cousin! worthy gentleman!
41+
42+
Sergeant. As whence the sun 'gins his reflection
43+
Shipwrecking storms and direful thunders break, 45
44+
So from that spring whence comfort seem'd to come
45+
Discomfort swells. Mark, king of Scotland, mark:
46+
No sooner justice had with valour arm'd
47+
Compell'd these skipping kerns to trust their heels,
48+
But the Norweyan lord surveying vantage, 50
49+
With furbish'd arms and new supplies of men
50+
Began a fresh assault.
51+
52+
Duncan. Dismay'd not this
53+
Our captains, Macbeth and Banquo?
54+
55+
Sergeant. Yes; 55
56+
As sparrows eagles, or the hare the lion.
57+
If I say sooth, I must report they were
58+
As cannons overcharged with double cracks, so they
59+
Doubly redoubled strokes upon the foe:
60+
Except they meant to bathe in reeking wounds, 60
61+
Or memorise another Golgotha,
62+
I cannot tell.
63+
But I am faint, my gashes cry for help.
64+
65+
Duncan. So well thy words become thee as thy wounds;
66+
They smack of honour both. Go get him surgeons. 65
67+
[Exit Sergeant, attended]
68+
Who comes here?"
69+
70+
GENERATED="Malcolm. The worthy Thane of Ross.
71+
Duncan. What a haste looks through a troop? and when may
72+
No sooner had this battle fought, than, ingrate and ungracious! 70
73+
Who leap'd my back, and thence hasten'd me away,
74+
And follows so he wins.
75+
76+
Malcolm. As I do live, my lord, so happily prosper I, without this blow might have o'erpaid the world: 75
77+
He loves our Majesty as boundlessly as we
78+
Muster ourselves, and make a full battalion.
79+
Duncan. Then enter, sir, and alone with me great battles 80
80+
I'll strain upon thy forehead, to this day
81+
It is a faith, and makes the fire that burn in my veins:
82+
Thou hast it now, king, afield to-morrow.
83+
God be wi' you, father.
84+
85+
Duncan. Farewell, farewell! or let me hear from you. 85
86+
[Exeunt]
87+
88+
THE THIRD SCENE
89+
Macbeth, Banquo, Ross, and Angus.
90+
Macbeth. So fair and foul a day I have not seen.
91+
It is calm, and yet is all together.
92+
[Thunder, then rain]
93+
Look, how the blood of Sweden flows from hence;
94+
The time is free. I see the Capitol; 90
95+
The city of kingly eyes.
96+
[Thunder, then lightening]
97+
And the remote parts of Parliament,
98+
Which now behold, now can behold no more 95
99+
In which time will appear how much
100+
I have translated the flesh of Banquo
101+
Into a crow that strays about the capital.
102+
[Treble knocks]
103+
The Prince of Cumberland; that is a step
104+
On which I must fall down, or else o'erleap, 100
105+
For in my way it lies. Stars, hide your fires;
106+
Let not light see my black and deep desires:
107+
The eye wink at the hand; yet let that be
108+
Which the eye fears to look upon.
109+
Let it be a sin; 105
110+
That the so lusterous, and so bright, so good
111+
Should but be seen a fellow to my crime;
112+
And dupe so ruin'd.
113+
[Treble knocks]
114+
That my keen knife
115+
See you satisfied.
116+
[Treble knocks] 110
117+
Go to thy death.
118+
[Re-enter a Servant]
119+
How now, what names.
120+
Servant. Name. Marrow, marrow; that is the very question that I put to thee, 115
121+
That is the very question that I put to thee,
122+
Macbeth. Thou 'rt mad that thy sword is not temper'd.
123+
What you lack of temperance, that I lack in valour.
124+
Art not without ambition; but without 120
125+
The illness should attend it. What thou wouldst highly,
126+
That wouldst thou boldly, and with thy virtues else
127+
Wouldst thou have wildly holden; let fall thy hand;
128+
[To the Servant]
129+
Kilt him like a boar. 125
130+
Macbeth. From this time these woes we will re-assume:
131+
Not from our fingers' ends? We still have left
132+
A special will to thrust these thorns more firmly.
133+
A little more the wisely. Gently the weather. 130
134+
[Treble knocks]
135+
The Prince of Cumberland; that is a step
136+
On which I must fall down, or else o'erleap,
137+
For in my way it lies. Stars, hide your fires;
138+
Let not light see my black and deep desires: 135
139+
The eye wink at the hand; yet let that be
140+
Which the eye fears to look upon.
141+
Let it be a sin; that the so lusterous, and so bright, so good
142+
Should but be seen a fellow to my crime; 140
143+
And dupe so ruin'd.
144+
[Treble knocks]
145+
That my keen knife
146+
See you satisfied.
147+
[Treble knocks]
148+
Go to thy death.
149+
[Re-enter a Servant]
150+
How now, what names.
151+
Servant. Name. Marrow, marrow; that is the very question that I put to thee, 145
152+
That is the very question that I put to thee,
153+
Macbeth. Thou 'rt mad that thy sword is not temper'd.
154+
What you lack of temperance, that I lack in valour.
155+
Art not without ambition; but without 150
156+
The illness should attend it. What thou wouldst highly,
157+
That wouldst thou boldly, and with thy virtues else
158+
Wouldst thou have wildly holden; let fall thy hand;
159+
[To the Servant]
160+
Kilt him like a boar. 155
161+
Macbeth. From this time these woes we will re-assume:
162+
Not from our fingers' ends? We still have left
163+
A special will to thrust these thorns more firmly.
164+
A little more the wisely. Gently the weather. 160
165+
[Treble knocks]
166+
Come, love, and we will a while chastise
167+
That dares come to this.
168+
[Re-enter a second Servant]
169+
What is that which caugh your eyes? 165
170+
Second Servant. My young lord, I can tell.
171+
To think that they may see such sights!
172+
And yet not be the eyes itself that see but, as 'tis said, a man should be the righter part of nature; if he be such, he need not
173+
come behindhand too. 170
174+
'Tis no time to cloak our faults.
175+
[Re-enter a third Servant]
176+
The very firstlings of my heart shall be
177+
The firstlings of my head; I'll be their patriarch.
178+
Come, put on gaiter; come, come, good mother, 175
179+
Damned entrance of weather!
180+
[Thunder]
181+
Come, get you to my woman's breasts; And on them give, and mercy onen me, let fall your holy disinclinations.
182+
[Exeunt]
183+
Act III. SCENE 1.
184+
The scene opens with the arrival of the King and his entourage at the castle of the thane of Fife. King Duncan, having heard of Macbeth's new successes, asks his thanes to rejoice with him. Macbeth's great respect for the king makes him slightly uncomfortable. King Duncan's concern for Macbeth's wife and children is further evidence of the king's warmth and loving nature. Macbeth appears ill at ease, perhaps at Duncan's evident concern. His language is overly formal and self-conscious, while his wife speaks rather bluntly. After Macbeth, the king, and his attendants enter, Banquo asks how Lady Macbeth is.
185+
186+
Macbeth. When I am gone, 180
187+
After life's fitful fever, he sleeps well,
188+
Though the powers of the strong world do set themselves
189+
Against his estate.
190+
King Duncan. So well to do!
191+
Had he his heart's desire, he 'd stoop
192+
To what humility 185
193+
Might become the matter.
194+
Macbeth. As the matter now I 've put it.
195+
King Duncan. Well then, 190
196+
Since that you are a father, show the child
197+
The taking off, and that which now you do 195
198+
Commit"
199+
200+
echo "Generating, it can take a while..."
201+
202+
OUTPUT=$(( ./main generate --seed 12345 --temperature 0.9 --topp 0.9 --prompt "$PROMPT" --weights-float-type q40 --buffer-float-type q80 --nthreads 8 --steps 2048 --model converter/dllama_meta-llama-3-8b_q40.bin --tokenizer converter/dllama_meta-llama3-tokenizer.t ) 2>&1)
203+
204+
echo "$OUTPUT"
205+
206+
if [[ $OUTPUT == *"$GENERATED"* ]]; then
207+
echo "✅ Output is same"
208+
else
209+
echo "❌ Output is different"
210+
fi

src/llama2-tasks.cpp

+2-2
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,8 @@ void llamaMultiheadAtt(TASK_ARGS) {
7171
float* k = block->keyCache + transformer->pos * spec->kvDim;
7272
float* v = block->valueCache + transformer->pos * spec->kvDim;
7373

74-
memcpy(k, transformer->buffer->getUnit(TB_SLICED_K), spec->dim * sizeof(float));
75-
memcpy(v, transformer->buffer->getUnit(TB_SLICED_V), spec->dim * sizeof(float));
74+
memcpy(k, transformer->buffer->getUnit(TB_SLICED_K), spec->kvDim * sizeof(float));
75+
memcpy(v, transformer->buffer->getUnit(TB_SLICED_V), spec->kvDim * sizeof(float));
7676
}
7777
}
7878

src/main.cpp

+19-10
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@ struct ProgramArgs {
2727
int* workerPorts;
2828
float temperature;
2929
float topp;
30-
int steps;
30+
pos_t steps;
31+
bool benchmark;
32+
unsigned long long seed;
3133

3234
// worker
3335
int port;
@@ -105,16 +107,19 @@ void generate(Inference* inference, SocketPool* socketPool, Tokenizer *tokenizer
105107

106108
// print the token as string, decode it with the Tokenizer object
107109
char* piece = tokenizer->decode(token, next);
108-
109-
printf("🔶 G %4ld ms I %4ld ms T %4ld ms S %6ld kB R %6ld kB ", generationTime, inferenceTime, transferTime, sentBytes / 1024, recvBytes / 1024);
110+
111+
if (args->benchmark)
112+
printf("🔶 %4d G %4ld ms I %4ld ms T %4ld ms S %6ld kB R %6ld kB ", pos, generationTime, inferenceTime, transferTime, sentBytes / 1024, recvBytes / 1024);
110113
safePrintf(piece); // same as printf("%s", piece), but skips "unsafe" bytes
111-
printf("\n");
114+
if (args->benchmark)
115+
printf("\n");
112116
fflush(stdout);
113117
token = next;
114118
}
115119

116120
free(promptTokens);
117121

122+
if (!args->benchmark) printf("\n");
118123
printf("Generated tokens: %d\n", pos);
119124
printf("Avg generation time: %.2f ms\n", totalGenerationTime / (double)pos);
120125
printf("Avg inference time: %.2f ms\n", totalInferenceTime / (double)pos);
@@ -286,22 +291,19 @@ int run(ProgramArgs* args, void (*program)(Inference* inference, SocketPool* soc
286291

287292
SocketPool* socketPool = SocketPool::connect(args->nWorkers, args->workerHosts, args->workerPorts);
288293
unsigned int nSlices = args->nWorkers + 1;
289-
unsigned long long rngSeed = (unsigned int)time(NULL);
290294

291295
TransformerSpec spec = Transformer::loadSpecFromFile(args->modelPath, nSlices, args->weightsFloatType, args->bufferFloatType);
292296
TransformerArch arch = getArch(&spec);
293297

294-
if (args->steps < 0) {
295-
args->steps = spec.seqLen;
296-
} else if (args->steps > spec.seqLen) {
298+
if (args->steps == 0 || args->steps > spec.seqLen) {
297299
args->steps = spec.seqLen;
298300
}
299301

300302
Tokenizer tokenizer(args->tokenizerPath, spec.vocabSize);
301303
Transformer transformer = Transformer::loadRootFromFile(args->modelPath, &spec, socketPool);
302304
Inference inference = Inference(&arch, args->nThreads, &transformer, socketPool);
303305

304-
Sampler sampler(spec.vocabSize, args->temperature, args->topp, rngSeed);
306+
Sampler sampler(spec.vocabSize, args->temperature, args->topp, args->seed);
305307

306308
program(&inference, socketPool, &tokenizer, &sampler, args, &spec);
307309

@@ -350,7 +352,8 @@ int main(int argc, char *argv[]) {
350352
args.port = 9990;
351353
args.temperature = 0.8f;
352354
args.topp = 0.9f;
353-
args.steps = -1;
355+
args.steps = 0;
356+
args.seed = (unsigned long long)time(NULL);
354357

355358
if (argc > 1) {
356359
args.mode = argv[1];
@@ -400,6 +403,8 @@ int main(int argc, char *argv[]) {
400403
args.temperature = atof(argv[i + 1]);
401404
} else if (strcmp(argv[i], "--topp") == 0) {
402405
args.topp = atof(argv[i + 1]);
406+
} else if (strcmp(argv[i], "--seed") == 0) {
407+
args.seed = atoll(argv[i + 1]);
403408
} else {
404409
printf("Unknown option %s\n", argv[i]);
405410
exit(EXIT_FAILURE);
@@ -408,6 +413,10 @@ int main(int argc, char *argv[]) {
408413

409414
if (args.mode != NULL) {
410415
if (strcmp(args.mode, "inference") == 0) {
416+
args.benchmark = true;
417+
return run(&args, generate);
418+
} else if (strcmp(args.mode, "generate") == 0) {
419+
args.benchmark = false;
411420
return run(&args, generate);
412421
} else if (strcmp(args.mode, "chat") == 0) {
413422
return run(&args, chat);

0 commit comments

Comments
 (0)