Update README.md

abetlen · abetlen · commit fc5d01c32102 · 2024-04-30T01:12:14.000-04:00
diff --git a/README.md b/README.md
@@ -490,14 +490,15 @@ Due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is requi
 
 ### Multi-modal Models
 
-`llama-cpp-python` supports the llava1.5 family of multi-modal models which allow the language model to
-read information from both text and images.
+`llama-cpp-python` supports such as llava1.5 which allow the language model to read information from both text and images.
 
 You'll first need to download one of the available multi-modal models in GGUF format:
 
 - [llava-v1.5-7b](https://huggingface.co/mys/ggml_llava-v1.5-7b)
 - [llava-v1.5-13b](https://huggingface.co/mys/ggml_llava-v1.5-13b)
 - [bakllava-1-7b](https://huggingface.co/mys/ggml_bakllava-1)
+- [llava-v1.6-34b](https://huggingface.co/cjpais/llava-v1.6-34B-gguf)
+- [moondream2](https://huggingface.co/vikhyatk/moondream2)
 
 Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.
 
@@ -509,7 +510,6 @@ Then you'll need to use a custom chat handler to load the clip model and process
   model_path="./path/to/llava/llama-model.gguf",
   chat_handler=chat_handler,
   n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
-  logits_all=True,# needed to make llava work
 )
 >>> llm.create_chat_completion(
     messages = [
@@ -525,6 +525,37 @@ Then you'll need to use a custom chat handler to load the clip model and process
 )
 ```
 
+You can also pull the model from the Hugging Face Hub using the `from_pretrained` method.
+
+```python
+>>> from llama_cpp import Llama
+>>> from llama_cpp.llama_chat_format import MoondreamChatHandler
+>>> chat_handler = MoondreamChatHandler.from_pretrained(
+  repo_id="vikhyatk/moondream2"
+  filename="*mmproj*",
+)
+>>> llm = Llama.from_pretrained(
+  repo_id="vikhyatk/moondream2"
+  filename="*text-model*",
+  chat_handler=chat_handler,
+  n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
+)
+>>> llm.create_chat_completion(
+    messages = [
+        {"role": "system", "content": "You are an assistant who perfectly describes images."},
+        {
+            "role": "user",
+            "content": [
+                {"type": "image_url", "image_url": {"url": "https://.../image.png"}},
+                {"type" : "text", "text": "Describe this image in detail please."}
+            ]
+        }
+    ]
+)
+```
+
+**Note**: Multi-modal models also support tool calling and JSON mode.
+
 <details>
 <summary>Loading a Local Image</summary>