Skip to content

How to convert .safetensors to GGUF? #12513

Closed Answered by ag2s20150909
pandayummy asked this question in Q&A
Discussion options

You must be logged in to vote

first clone git repo

git clone https://github.com/ggerganov/llama.cpp.git

Install the Python Libraries

pip install -r llama.cpp/requirements.txt

convert model to fp16.gguf

python llama.cpp/convert_hf_to_gguf.py the_dir_of_model    

quantize gguf

llama-quantize.exe xxx-F16.gguf  xxx-Q4_K_M.gguf Q4_K_M




usage: llama-quantize.exe [--help] [--allow-requantize] [--leave-output-tensor] [--pure] [--imatrix] [--include-weights] [--exclude-weights] [--output-tensor-type] [--token-embedding-type] [--override-kv] model-f32.gguf [model-quant.gguf] type [nthreads]

  --allow-requantize: Allows requantizing tensors that have already been quantized. Warning: This can severely reduce quality compared…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by pandayummy
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants