Skip to content

Commit fc5d01c

Browse files
committed
Update README.md
1 parent 0e15835 commit fc5d01c

File tree

1 file changed

+34
-3
lines changed

1 file changed

+34
-3
lines changed

README.md

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -490,14 +490,15 @@ Due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is requi
490490

491491
### Multi-modal Models
492492

493-
`llama-cpp-python` supports the llava1.5 family of multi-modal models which allow the language model to
494-
read information from both text and images.
493+
`llama-cpp-python` supports such as llava1.5 which allow the language model to read information from both text and images.
495494

496495
You'll first need to download one of the available multi-modal models in GGUF format:
497496

498497
- [llava-v1.5-7b](https://huggingface.co/mys/ggml_llava-v1.5-7b)
499498
- [llava-v1.5-13b](https://huggingface.co/mys/ggml_llava-v1.5-13b)
500499
- [bakllava-1-7b](https://huggingface.co/mys/ggml_bakllava-1)
500+
- [llava-v1.6-34b](https://huggingface.co/cjpais/llava-v1.6-34B-gguf)
501+
- [moondream2](https://huggingface.co/vikhyatk/moondream2)
501502

502503
Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.
503504

@@ -509,7 +510,6 @@ Then you'll need to use a custom chat handler to load the clip model and process
509510
model_path="./path/to/llava/llama-model.gguf",
510511
chat_handler=chat_handler,
511512
n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
512-
logits_all=True,# needed to make llava work
513513
)
514514
>>> llm.create_chat_completion(
515515
messages = [
@@ -525,6 +525,37 @@ Then you'll need to use a custom chat handler to load the clip model and process
525525
)
526526
```
527527

528+
You can also pull the model from the Hugging Face Hub using the `from_pretrained` method.
529+
530+
```python
531+
>>> from llama_cpp import Llama
532+
>>> from llama_cpp.llama_chat_format import MoondreamChatHandler
533+
>>> chat_handler = MoondreamChatHandler.from_pretrained(
534+
repo_id="vikhyatk/moondream2"
535+
filename="*mmproj*",
536+
)
537+
>>> llm = Llama.from_pretrained(
538+
repo_id="vikhyatk/moondream2"
539+
filename="*text-model*",
540+
chat_handler=chat_handler,
541+
n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
542+
)
543+
>>> llm.create_chat_completion(
544+
messages = [
545+
{"role": "system", "content": "You are an assistant who perfectly describes images."},
546+
{
547+
"role": "user",
548+
"content": [
549+
{"type": "image_url", "image_url": {"url": "https://.../image.png"}},
550+
{"type" : "text", "text": "Describe this image in detail please."}
551+
]
552+
}
553+
]
554+
)
555+
```
556+
557+
**Note**: Multi-modal models also support tool calling and JSON mode.
558+
528559
<details>
529560
<summary>Loading a Local Image</summary>
530561

0 commit comments

Comments
 (0)