How can I run 8bit models? #1641

kadirnar · 2024-07-30T16:58:49Z

kadirnar
Jul 30, 2024

I want to run 8bit files using the LlamaCpp-Python library. Should I upload 2 files at the same time? Can you share sample code?

laelhalawani · 2024-08-25T18:38:43Z

laelhalawani
Aug 25, 2024

GGUFs pack everything into one file, so you only need one file.
Make sure you download the right gguf.

A lot of providers offers ready ggufs nowdays. Use search for MODEL_NAME_YOU_WANT GGUF

Once you have the repo id, you can run the download using the built in from_pretrained method on Llama class in the high-level API.
Like in the example below.

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="bartowski/Meta-Llama-3.1-8B-Instruct-GGUF", #specify HF repo id where the model files are hosted
    filename="*q8_0.gguf", #will download file that matches this pattern name (so one ending with q8_0.gguf, so an 8bit quant)
    local_dir="./ai/llm_models/", #optionally specific dir to save the model file (other wise it will cache it in the cache dir)
    #verbose=True,
    # n_gpu_layers=-1, #uncomment this if you have llama-cpp-python with gpu support installed, otherwise it uses CPU
    chat_format="llama", #specify chat template, not sure if this is obligatory or detected automatically
)

output = llm.create_chat_completion(
    messages = [
        {"role": "system", "content": ""},
        {
            "role": "user",
            "content": "Hi, I'm just testing if it works.",
        }        
    ],
    max_tokens=256, #max number of tokens, you can check for each model what's the max it can support online
)

print(output)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I run 8bit models? #1641

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How can I run 8bit models? #1641

Uh oh!

kadirnar Jul 30, 2024

Replies: 1 comment

Uh oh!

Uh oh!

laelhalawani Aug 25, 2024

kadirnar
Jul 30, 2024

laelhalawani
Aug 25, 2024