Description
Is your feature request related to a problem? Please describe.
Today, the [llama-cpp/llama_chat_format.py] contains 25 chat format, and 4 chat_completion_handler, this currently force the different actors to contribute to this never ending growing file.
This is the case for the functionary models, which has to keep updating the handlers to support their newer models.
- Add functionary support #784
- Integrate functionary v1.4 and v2 models + add HF AutoTokenizer as optional parameter in llama.create_completion #1078
- Fix and optimize functionary chat handler #1282
- Functionary bug fixes #1385
- Implement streaming for Functionary v2 + Bug fixes #1419
- adding missing args in create_completion for functionary chat handler #1430
- Integrate Functionary v2.5 + Refactor Functionary Code #1509
This process can be slower than their pace of release since they have to get approval on this repository, the amazing people behind the functionary models have a repository with the necessary code to transform the generated content to proper CreateChatCompletionStreamResponse, and it would make sense that this would be their responsibility.
Describe the solution you'd like
python (>3.3) offers a lot of way to load code from other packages, or packages to contribute to a main packages. This would have a lot of advantages, as model provider could maintain their own packages, and rely on their own testing/versioning.
Additional context
Add any other context or screenshots about the feature request here.