docs: update README on preparing quantized model #19

krschacht · 2024-07-02T00:45:38Z

docs: update description of preparing quantized model in usage section.

Correct some some outdated references to files within the llama.cpp repo and update the example to use a smaller model.

krschacht · 2024-07-03T11:36:23Z

@yoshoku I don't understand your commitlint. I get the line length, but it's referring to some subject and type which eludes me. But you're welcome to reject this PR if you don't care about updating this. I mostly was keeping these notes for myself as I was trying to get your project working since some of the references had changed within llama_cpp

yoshoku · 2024-07-03T15:15:16Z

@krschacht Thank you for your contribution. llama_cpp.rb adopts conventional commits: https://www.conventionalcommits.org/en/v1.0.0/ For example, the commit message for this change might be docs: update description of preparing quantized model in usage section.

krschacht · 2024-07-03T15:57:56Z

@yoshoku Ah, got it. Somehow I never knew about conventional commits! TIL :) I just updated the PR so hopefully it's ready to go.

BTW, do you have other plans for this project? I was very excited to find this. I found it while looking for the ruby equivalent of Python's https://github.com/jncraton/languagemodels. I actually really like that this PIP package uses CTranslate2 as the backend and I haven't found a Ruby gem doing the bindings for CTranslate2.

The one issue I ran into with your project was when I tried to load a model like LaMini-Flan-T5-248M. I'm new to this space, but apparently Llama is it's own model architecture whereas T5 is a different architecture, so I can't use llama_cpp.rb to run a T5 model.

yoshoku · 2024-07-04T15:14:12Z

@krschacht I wanted you to fix the git commit message, not the pull request description. But, I understood the gist of the pull request, so I fixed README with you as a co-author 8edfd6d. I am going to close this pull request, but please do not take it personally.

krschacht · 2024-07-04T23:51:21Z

@yoshoku I don't mind at all! Linters... I'm not looking for points. :) I also run a project and I jump in to help get PRs over the line all the time. It's often easier.

I really am interested in if you have other plans for this project? I was thinking about trying to create a version of what you did but for CTranslate2. But maybe I'm wrong and your llama_cpp bindings can also work for T5 models? Anyway, curious where you plan to take your project

yoshoku · 2024-07-05T15:43:10Z

@krschacht I think it would be a good idea to create bindings for CTranslate2, but I am pretty busy these days so I probably will not have time to do it.
llama.cpp only recently added support for the T5 architecture, so llama_cpp.rb does not yet support it: ggml-org/llama.cpp#8141. I plan to add bindings for newly added functions such as llama_model_has_encoder, but I cannot guarantee that the example scripts will also support the T5 architecture, as this depends on my free time.

krschacht · 2024-07-05T17:05:19Z

I didn't realize llama_cpp had recently added support for T5. I found an interesting thread talking about the performance of CTranslate2 vs llama_cpp and it sounds like some of the performance optimizations within CTranslate2 are already in llama_cpp, so the performance may not be that different.

Anyway, I appreciate you creating this project and for sharing this insight. Consider me a motivated & interested "user" in case you ever need help with testing, bugs, implementing specific pieces, etc. I'll keep playing with llama_cpp.rb now that I have it all working!

Update README.md

27a519c

Correct some some outdated references to files within the llama.cpp repo and update the example to use a smaller model.

fix line length

1ca5305

krschacht changed the title ~~Correct README.md~~ docs: update README on preparing quantized model Jul 3, 2024

yoshoku closed this Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update README on preparing quantized model #19

docs: update README on preparing quantized model #19

krschacht commented Jul 2, 2024 •

edited

Loading

krschacht commented Jul 3, 2024

yoshoku commented Jul 3, 2024

krschacht commented Jul 3, 2024 •

edited

Loading

yoshoku commented Jul 4, 2024

krschacht commented Jul 4, 2024

yoshoku commented Jul 5, 2024

krschacht commented Jul 5, 2024

docs: update README on preparing quantized model #19

docs: update README on preparing quantized model #19

Conversation

krschacht commented Jul 2, 2024 • edited Loading

krschacht commented Jul 3, 2024

yoshoku commented Jul 3, 2024

krschacht commented Jul 3, 2024 • edited Loading

yoshoku commented Jul 4, 2024

krschacht commented Jul 4, 2024

yoshoku commented Jul 5, 2024

krschacht commented Jul 5, 2024

krschacht commented Jul 2, 2024 •

edited

Loading

krschacht commented Jul 3, 2024 •

edited

Loading