Is there anyone run llamacpp on Jetson Orin SoC or other devices, how about the performance? #5059

adamydwang · 2024-01-21T04:47:17Z

adamydwang
Jan 21, 2024

I want to know the performance of 7b or 13b models on device chips, especially the first token latency

niciBume · 2024-07-23T12:53:03Z

niciBume
Jul 23, 2024

Probably a bit late, but on Jetson Orin AGX 64GB I get approx 280 tks/s on llama2 7B:

./llama-cli -m models/llama2-7b.Q4_K_M.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 100

3 replies

AndreasKunar Feb 14, 2025

Sorry, also late reply, just got my Yahboom Jetson Orin NX Super 16GB Dev Kit this week (the reduced-price Nano Super is not shipping and I also wanted 16GB with a faster CPU/GPU). Here a llama2 benchmarking of it analogous to discussion #4167

HW: Jetson Orin NX SUPER 16GB, - Jetpack 6.2

llama-bench build: c48f630 (4708)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	CUDA	99	pp512	397.91 ± 0.24
llama 7B Q4_0	3.56 GiB	6.74 B	CUDA	99	tg128	17.64 ± 0.04
llama 7B Q8_0	6.67 GiB	6.74 B	CUDA	99	pp512	409.07 ± 6.72
llama 7B Q8_0	6.67 GiB	6.74 B	CUDA	99	tg128	12.66 ± 0.02
llama 7B F16	12.55 GiB	6.74 B	CUDA	99	pp512	504.26 ± 23.90
llama 7B F16	12.55 GiB	6.74 B	CUDA	99	tg128	7.02 ± 0.02

Power-consumption approximately (~25W during PP, ~10W during TG)
(M4 Pro Mac uses ~35W during PP, ~20W during TG)

So net, the PP is similar to my Apple M4 Pro 20-core (400 vs. M4 Pro 480 for Q4_0), but the TG is low, a bit lower than a M2/M3 (it has the same ~100GB/s memory-bandwidth).

Its 8-core CPU-only performance is quite bad, 1/2 of a M2, 1/3 of a Snapdragon X Elite. At power-consumption ~20W.

model	size	params	backend	threads	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	CPU	8	pp512	37.54 ± 0.14
llama 7B Q4_0	3.56 GiB	6.74 B	CPU	8	tg128	9.00 ± 0.01

Overall I'm happy with the Orin NX 16GB, but its Ampere-GPU architecture does not support fp8, so disappointing for diffusion/ComfyUI. Price/performance is OK if you want the security of containers and to experiment with CUDA. Otherwise a M4 Mac Mini is a much better choice. Still looking forward to their Project DIGITS.

ggerganov Feb 14, 2025
Maintainer

Make sure to enable -fa 1 in your benches.

AndreasKunar Feb 16, 2025

@ggerganov thanks a lot for pointing this out. I wanted to keep it comparable with the Mac results, because they were without fa. Yet it makes not much of a difference on my Mac, but a lot on NVIDIA.

So here the updated results with -fa 1 for build 438a839 (4790), PP gets a significant boost with it!

model	size	params	backend	ngl	fa	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	CUDA	99	1	pp512	512.69 ± 0.38
llama 7B Q4_0	3.56 GiB	6.74 B	CUDA	99	1	tg128	18.41 ± 0.02
llama 7B Q8_0	6.67 GiB	6.74 B	CUDA	99	1	pp512	512.61 ± 9.95
llama 7B Q8_0	6.67 GiB	6.74 B	CUDA	99	1	tg128	13.08 ± 0.02
llama 7B F16	12.55 GiB	6.74 B	CUDA	99	1	pp512	592.50 ± 28.56
llama 7B F16	12.55 GiB	6.74 B	CUDA	99	1	tg128	7.16 ± 0.00

below also the REPORTED power-consumption (by jtop for the Jetson, asitop for the Mac) for a comparable llama 7B Q4_0 PP speed:

HW	pp512 t/s	W	tg128 t/s	W
Orin NX 16GB SUPER	~510	26	~18	26
M4 Pro 20GPU	~490	33	~55	18

All measurements with the Orin NX were done in "MAXN_SUPER" power-mode, to maximize performance.

UPDATED 2025-02-28: with new measurements (after a fresh install) and setting the fan-profile to "cool".

P.S: Linux on the Jetson is a bit of a hassle. I had to completely re-install the DevKit because of some weird package version-conflicts and CUDA suddenly not running anymore. And you need an x64 Linux machine for easily installing the Jetson, but I only have M-series Macs and a Snapdragon X Elite PC. Therefore the delayed answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there anyone run llamacpp on Jetson Orin SoC or other devices, how about the performance? #5059

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Is there anyone run llamacpp on Jetson Orin SoC or other devices, how about the performance? #5059

adamydwang Jan 21, 2024

Replies: 1 comment · 3 replies

niciBume Jul 23, 2024

AndreasKunar Feb 14, 2025

ggerganov Feb 14, 2025 Maintainer

AndreasKunar Feb 16, 2025

adamydwang
Jan 21, 2024

Replies: 1 comment 3 replies

niciBume
Jul 23, 2024

ggerganov Feb 14, 2025
Maintainer