Allow for possibly non-pooled embeddings #1380

iamlemec · 2024-04-24T16:13:19Z

Getting embeddings direct from "non-embedding" LLMs is pretty important for achieving SOTA performance (see MTEB Leaderboard). This PR makes it so that the embed functions return token level embeddings when the pooling_type ends up being LLAMA_POOLING_TYPE_NONE and the appropriate sequence level embeddings otherwise. After that, the user can do whatever type of pooling they wish. This should address most of the concerns raised in #1288 too.

There are still some gotchas. If you request mean/cls pooling on a model that doesn't support it, it will hard crash in llama.cpp. And it's hard to know ex ante what types of pooling a model supports. So it's probably just better to wait until this gets sorted out in llama.cpp.

Also, this changes the default for normalize to False. You typically want to normalize after pooling, so you would not want to normalize when getting token level embeddings. The other option is defaulting to False for token level and True for sequence level, but that seems overly complicated.

Note: this depends on the llama_pooling_type function, which just got merged in ggml-org/llama.cpp@b4e4b8a.

abetlen · 2024-04-25T06:54:06Z

llama_cpp/llama_cpp.py

+# LLAMA_API enum llama_pooling_type llama_pooling_type(const struct llama_model * model);
+@ctypes_function("llama_pooling_type", [llama_model_p_ctypes], ctypes.c_int)
+def llama_pooling_type(model: llama_model_p, /) -> int:
+    ...
+
+


Suggested change

# LLAMA_API enum llama_pooling_type llama_pooling_type(const struct llama_model * model);

@ctypes_function("llama_pooling_type", [llama_model_p_ctypes], ctypes.c_int)

def llama_pooling_type(model: llama_model_p, /) -> int:

...

I added this function elsewhere in the file to match the llama.h order so we can delete this.

abetlen · 2024-04-25T07:02:19Z

@iamlemec thank you so much for this!

I suppose expanding the possible return type of Embeddings is necessary to support sequences of embeddings and provide the ability to do your own pooling so I think that's okay.

Would you mind expanding the Embedding section in the README to explain this behaviour? It's quite bare at the moment and would be very helpful if anyone has questions here.

iamlemec · 2024-04-25T21:19:54Z

Yup, just updated the README!

abetlen · 2024-04-26T01:32:30Z

@iamlemec thank you that's fantastic! I'll go ahead and merge this now!

* allow for possibly non-pooled embeddings * add more to embeddings section in README.md --------- Co-authored-by: Andrei <abetlen@gmail.com>

iamlemec and others added 2 commits April 24, 2024 10:00

allow for possibly non-pooled embeddings

2d07677

Merge branch 'main' into do-pooling

e22f853

abetlen requested changes Apr 25, 2024

View reviewed changes

add more to embeddings section in README.md

c15d867

Merge branch 'main' into do-pooling

b586060

abetlen merged commit f6ed21f into abetlen:main Apr 26, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow for possibly non-pooled embeddings #1380

Allow for possibly non-pooled embeddings #1380

Uh oh!

iamlemec commented Apr 24, 2024 •

edited

Loading

Uh oh!

abetlen Apr 25, 2024

Uh oh!

abetlen commented Apr 25, 2024

Uh oh!

iamlemec commented Apr 25, 2024

Uh oh!

abetlen commented Apr 26, 2024

Uh oh!

Uh oh!

Uh oh!

Allow for possibly non-pooled embeddings #1380

Allow for possibly non-pooled embeddings #1380

Uh oh!

Conversation

iamlemec commented Apr 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abetlen Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

abetlen commented Apr 25, 2024

Uh oh!

iamlemec commented Apr 25, 2024

Uh oh!

abetlen commented Apr 26, 2024

Uh oh!

Uh oh!

Uh oh!

iamlemec commented Apr 24, 2024 •

edited

Loading