Serve multiple models with llamacpp server #10431

PierreCarceller · 2024-11-20T13:41:32Z

PierreCarceller
Nov 20, 2024

Is there a way to serve more than one model at the same time? (I don't think so).
Are there any plans to add this feature?

ExtReMLapin · 2024-11-20T14:24:22Z

ExtReMLapin
Nov 20, 2024

abetlen/llama-cpp-python#931

or just start multiple instance lol

1 reply

PierreCarceller Nov 20, 2024
Author

I saw the link you shared but I'd like to use llamacpp server. Not llamacpp-python server.

"or just start multiple instance lol" -> For sure it's a solution that works, but it's a pain to have to manage several instances...

3Simplex · 2024-11-20T20:16:23Z

3Simplex
Nov 20, 2024

I start each model in a separate server instance using a new port for each model. Then I make an API call to any running server providing the array of messages intended for that model. I can also use the very nice webUI that @ngxson made for any running server!

0 replies

kth8 · 2024-11-22T09:50:24Z

kth8
Nov 22, 2024

I saw this project on r/LocalLLaMA as a workaround: https://github.com/mostlygeek/llama-swap

0 replies

Gomez12 · 2024-11-28T05:35:48Z

Gomez12
Nov 28, 2024

I would say either add a proxy server in front of it, or just use a project like llama-swap. Then you can just have all the config options etc per server.

The proxy would just need some simple routing, some simple queuing per server and options like trying to start new model as second server (if you have the memory available) and then you can just reuse all server options.
It also opens up possibilities like reusing the same model with multiple context sizes etc.

0 replies

jukofyork · 2024-11-30T07:36:51Z

jukofyork
Nov 30, 2024

I haven't tried using it, but there is also this:

https://github.com/perk11/large-model-proxy

It looks like you can run multiple backend even and then have them unload automatically using least recently used, etc.

0 replies

xuniversus · 2025-01-05T00:25:30Z

xuniversus
Jan 5, 2025

Still seems like a nice feature, if the will to implement it is there.

0 replies

ocean1ee1 · 2025-03-19T07:19:00Z

ocean1ee1
Mar 19, 2025

I'm learning to use llama.cpp... Have a similar need. Basically I would like to set up some rules for different prompt types and decide to use different models on one server.

Still not sure weather currently it's possible the original llama-server application supports to load / swap different models via an api call to specific a model path or not.

for example, I cannot change a model via the below call even will get a success response, maybe my usage is not correct.
curl --url http://127.0.0.1:8080/props -d '{"model_path":"/model/xxxx.gguf"}' -X POST

0 replies

ocean1ee1 · 2025-03-19T10:37:34Z

ocean1ee1
Mar 19, 2025

@braycarla

Could you please share a link that contains how to load multiple models?

I went though some articles, however did not figure out how to pass multiple models. 😢

Also tried the below commands, all did not work.

./llama-server -m ./a.gguf -m ./b.gguf
./llama-server -m ./a.gguf , ./b.gguf
./llama-server -m ./a.gguf ; ./b.gguf
./llama-server -m ./a.gguf ./b.gguf

0 replies

braycarla · 2025-03-19T11:12:38Z

braycarla
Mar 19, 2025

To serve multiple models using llama.cpp with llamacpp .info server, you need to run the server in a way that supports multiple models. Here’s how you can do it:

0 replies

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serve multiple models with llamacpp server #10431

{{title}}

Replies: 10 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

This comment was marked as spam.

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Serve multiple models with llamacpp server #10431

Replies: 10 comments · 1 reply

PierreCarceller Nov 20, 2024 Author

This comment was marked as spam.

Replies: 10 comments 1 reply

PierreCarceller Nov 20, 2024
Author