Feature Request: Add support for unsupported Ascend NPU devices #885

leo-pony · 2025-02-26T03:13:43Z

I want to integrates Ascend NPU hardware capabilities into the ramalama, and enables users to leverage the NPUs for inference and serving of LLMs.

Ascend NPU is an AI processor that has supports LLM inference engines such as vllm, llama.cpp, onnxruntime.

As mentioned above, Ascend NPU also supports AI frameworks like PyTorch, TensorFlow, which has already integrated by huggingface, deepspeed, and libraries such as Transformers/Accelerate for traning and fine-tune.

This patch address two aspects:
Add code in ramalama to build llama.cpp-cann binary and docker image.
Add code in ramalama to select vllm ascend docker image.

leo-pony · 2025-02-26T03:27:46Z

@ericcurtin Could express your opinion?

ericcurtin · 2025-02-26T10:57:27Z

SGTM happy to look at PRs around this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add support for unsupported Ascend NPU devices #885

Feature Request: Add support for unsupported Ascend NPU devices #885

leo-pony commented Feb 26, 2025 •

edited

Loading

leo-pony commented Feb 26, 2025 •

edited

Loading

ericcurtin commented Feb 26, 2025

Feature Request: Add support for unsupported Ascend NPU devices #885

Feature Request: Add support for unsupported Ascend NPU devices #885

Comments

leo-pony commented Feb 26, 2025 • edited Loading

leo-pony commented Feb 26, 2025 • edited Loading

ericcurtin commented Feb 26, 2025

leo-pony commented Feb 26, 2025 •

edited

Loading

leo-pony commented Feb 26, 2025 •

edited

Loading