You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are there any plans for supporting 16 bit floats (fp16 and Bfloat16) quantization for embeddings. I would assume it would be an easier choice that doesn't separately need to train any codebooks and gives some headroom for scaling indexes without compromising on recall quality
The text was updated successfully, but these errors were encountered:
AFAIk FP16C allows to convert between fp16 and fp32 should be available on most AVX CPUs. AVX512-fp16 provides the ability to perform maths on fp16 directly, but is only supported by a very small number of the latest CPUs.
For BF16 I am not aware if any architecture supports vectorized math yet . So i would assume we would have to convert back and forth , might be increased search latency (if not bottlenecked on memory bandwidth) but saves quite a bit on embedding space with little degradation on recall if any
Are there any plans for supporting 16 bit floats (fp16 and Bfloat16) quantization for embeddings. I would assume it would be an easier choice that doesn't separately need to train any codebooks and gives some headroom for scaling indexes without compromising on recall quality
The text was updated successfully, but these errors were encountered: