exllamav2 and IBM Granite models #460

RaRasputinRGLM · 2024-05-20T17:28:37Z

RaRasputinRGLM
May 20, 2024

I wanted to work with these because the sizes are appealing for local LLMs, they all use the starcoder tokenizer, however the model.embed_tokens is not found in any of the safe tensors, I was thinking, that I could hack a solution out with extracting the safetensor piece from the starcoder, but I think there might be a faster way to go that I am missing due to inexperience.

I don't think this is really a exllamav2 issue but worth discussing. Here is a similar issue with IBM granite that can give some insight about how they structured their models, over at llama.cpp ggml-org/llama.cpp#7116

In short what is the right route with exllamav2 and missing embed_tokens if you know the tokenizer it uses?

turboderp · 2024-05-20T17:47:32Z

turboderp
May 20, 2024
Maintainer

I'm unsure what the issue is? I added support for Granite two weeks ago.

It indeed doesn't have an embed_tokens tensor, but that's just because GPTBigCode calls it wte instead. It has a number of other features that set it apart from Llama, like learned embeddings, non-gated MLP and bias everywhere, but that should all be accounted for.

6 replies

RaRasputinRGLM May 20, 2024
Author

I tracked it down to the init for the STfile in fastensors.py, the header key is transformer.wte.weight
instead of just wte.weight, so it never sees it since it isn't the beginning of the header key

turboderp May 20, 2024
Maintainer

That appears top be a new bug that's crept in. I did convert that model successfully here: https://huggingface.co/turboderp/granite-20b-code-instruct-exl2

I'll fix it in a moment.

turboderp May 20, 2024
Maintainer

There, fixed it. It crept in while I was refactoring and adding GPT2 support. If you pull the latest master it should allow you to quantize. Inference on quantized models should work even without the fix (with a prebuilt wheel of 0.0.21 for instance) since the tensors are remapped as part of the conversion.

RaRasputinRGLM May 20, 2024
Author

Thanks for the fix, learned more about these models doing this, it was a much easier fix than the route I was going down.

turboderp May 20, 2024
Maintainer

Thanks for pointing it out. Probably wouldn't have caught it for a while.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exllamav2 and IBM Granite models #460

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

exllamav2 and IBM Granite models #460

RaRasputinRGLM May 20, 2024

Replies: 1 comment · 6 replies

turboderp May 20, 2024 Maintainer

RaRasputinRGLM May 20, 2024 Author

turboderp May 20, 2024 Maintainer

turboderp May 20, 2024 Maintainer

RaRasputinRGLM May 20, 2024 Author

turboderp May 20, 2024 Maintainer

RaRasputinRGLM
May 20, 2024

Replies: 1 comment 6 replies

turboderp
May 20, 2024
Maintainer

RaRasputinRGLM May 20, 2024
Author

turboderp May 20, 2024
Maintainer

turboderp May 20, 2024
Maintainer

RaRasputinRGLM May 20, 2024
Author

turboderp May 20, 2024
Maintainer