Skip to content

Allow for possibly non-pooled embeddings #1380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 26, 2024
Merged

Conversation

iamlemec
Copy link
Contributor

@iamlemec iamlemec commented Apr 24, 2024

Getting embeddings direct from "non-embedding" LLMs is pretty important for achieving SOTA performance (see MTEB Leaderboard). This PR makes it so that the embed functions return token level embeddings when the pooling_type ends up being LLAMA_POOLING_TYPE_NONE and the appropriate sequence level embeddings otherwise. After that, the user can do whatever type of pooling they wish. This should address most of the concerns raised in #1288 too.

There are still some gotchas. If you request mean/cls pooling on a model that doesn't support it, it will hard crash in llama.cpp. And it's hard to know ex ante what types of pooling a model supports. So it's probably just better to wait until this gets sorted out in llama.cpp.

Also, this changes the default for normalize to False. You typically want to normalize after pooling, so you would not want to normalize when getting token level embeddings. The other option is defaulting to False for token level and True for sequence level, but that seems overly complicated.

Note: this depends on the llama_pooling_type function, which just got merged in ggml-org/llama.cpp@b4e4b8a.

Comment on lines +1188 to +1193
# LLAMA_API enum llama_pooling_type llama_pooling_type(const struct llama_model * model);
@ctypes_function("llama_pooling_type", [llama_model_p_ctypes], ctypes.c_int)
def llama_pooling_type(model: llama_model_p, /) -> int:
...


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# LLAMA_API enum llama_pooling_type llama_pooling_type(const struct llama_model * model);
@ctypes_function("llama_pooling_type", [llama_model_p_ctypes], ctypes.c_int)
def llama_pooling_type(model: llama_model_p, /) -> int:
...

I added this function elsewhere in the file to match the llama.h order so we can delete this.

@abetlen
Copy link
Owner

abetlen commented Apr 25, 2024

@iamlemec thank you so much for this!

I suppose expanding the possible return type of Embeddings is necessary to support sequences of embeddings and provide the ability to do your own pooling so I think that's okay.

Would you mind expanding the Embedding section in the README to explain this behaviour? It's quite bare at the moment and would be very helpful if anyone has questions here.

@iamlemec
Copy link
Contributor Author

Yup, just updated the README!

@abetlen
Copy link
Owner

abetlen commented Apr 26, 2024

@iamlemec thank you that's fantastic! I'll go ahead and merge this now!

@abetlen abetlen merged commit f6ed21f into abetlen:main Apr 26, 2024
16 checks passed
xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 30, 2024
* allow for possibly non-pooled embeddings

* add more to embeddings section in README.md

---------

Co-authored-by: Andrei <abetlen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants