Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run on Linux with Intel NPU and OpenVINO #2929

Open
nilp0inter opened this issue Mar 23, 2025 · 0 comments
Open

Unable to run on Linux with Intel NPU and OpenVINO #2929

nilp0inter opened this issue Mar 23, 2025 · 0 comments

Comments

@nilp0inter
Copy link

nilp0inter commented Mar 23, 2025

I have whisper.cpp with OpenVINO working on my setup and running great with CPU and GPU OpenVINO devices, but when I try to use the NPU device it fails this way:

$ whisper-cpp -m models/ggml-base.bin -f output.wav -l es --print-colors -oved NPU
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   16.26 MB
whisper_init_state: compute buffer (encode) =   85.86 MB
whisper_init_state: compute buffer (cross)  =    4.65 MB
whisper_init_state: compute buffer (decode) =   96.35 MB
whisper_ctx_init_openvino_encoder_with_state: loading OpenVINO model from 'models/ggml-base-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder_with_state: first run on a device may take a while ...
whisper_openvino_init: path_model = models/ggml-base-encoder-openvino.xml, device = NPU, cache_dir = models/ggml-base-encoder-openvino-cache
in openvino encoder compile routine: exception: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:750:
Exception from src/plugins/intel_npu/src/compiler_adapter/src/ze_graph_ext_wrappers.cpp:361:
L0 pfnCreate2 result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation .



whisper_ctx_init_openvino_encoder_with_state: failed to init OpenVINO encoder from 'models/ggml-base-encoder-openvino.xml'

system_info: n_threads = 4 / 22 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 1 | CANN = 0

main: processing 'output.wav' (2135256 samples, 133.5 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = es, task = transcribe, timestamps = 1 ...

I'd appreciate any insights or suggestions from the community or maintainers regarding this issue, especially in terms of fixing or debugging the problem further.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant