Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add support for unsupported Ascend NPU devices #885

Open
leo-pony opened this issue Feb 26, 2025 · 2 comments
Open

Feature Request: Add support for unsupported Ascend NPU devices #885

leo-pony opened this issue Feb 26, 2025 · 2 comments

Comments

@leo-pony
Copy link

leo-pony commented Feb 26, 2025

I want to integrates Ascend NPU hardware capabilities into the ramalama, and enables users to leverage the NPUs for inference and serving of LLMs.

Ascend NPU is an AI processor that has supports LLM inference engines such as vllm, llama.cpp, onnxruntime.

As mentioned above, Ascend NPU also supports AI frameworks like PyTorch, TensorFlow, which has already integrated by huggingface, deepspeed, and libraries such as Transformers/Accelerate for traning and fine-tune.

This patch address two aspects:
Add code in ramalama to build llama.cpp-cann binary and docker image.
Add code in ramalama to select vllm ascend docker image.

@leo-pony
Copy link
Author

leo-pony commented Feb 26, 2025

@ericcurtin Could express your opinion?

@ericcurtin
Copy link
Collaborator

SGTM happy to look at PRs around this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants