Cortex.cpp: Local Engines and Dependencies #1117
Replies: 10 comments
-
On my perspective, we should download CUDA toolkit separately. We support multiple engines: cortex.llamacpp and cortex.tensorrt-llm, both need CUDA toolkit to run. CUDA is backward compatible so we only need the latest CUDA toolkit version that supported by nvidia-driver version.
Edit: I just checked the cuda matrix compatibility and it is incorrect that CUDA is always backward compatible Related ticket: #1047 Edit 2: The above image is forward compatibility between cuda and nvidia-version
So yes, CUDA is backward compatible within a CUDA major release |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I'm referring this table to check for the compatibility between driver and toolkit |
Beta Was this translation helpful? Give feedback.
-
Can I verify my understanding of the issue: Decision
My initial thoughts
This will be disk-space inefficient. However, the alternative seems to be dependency hell, which I think is even worse. Folder Structure
That said, am open to all ideas, especially @vansangpfiev's |
Beta Was this translation helpful? Give feedback.
-
If disk-space inefficient is acceptable, I think we can go with option 1.
|
Beta Was this translation helpful? Give feedback.
-
Thanks @vansangpfiev and @dan-homebrew I'm confirming that we agree with:
Question 2: Storing CUDA dependencies under corresponding engines.
Caveats:
Additional thought |
Beta Was this translation helpful? Give feedback.
-
/.cortex
/deps
/cuda
cuda-11.5 or whatever versioning
/engines
/cortex.llamacpp
/bin
/cortex.tensorrt-llm
/bin
|
Beta Was this translation helpful? Give feedback.
-
@0xSage , here's my thought. Please correct me if I'm wrong @nguyenhoangthuan99 @vansangpfiev
|
Beta Was this translation helpful? Give feedback.
-
For 3, I think we can do the maintenance and updates by versioning: generate a file (for example version.txt) for each release, which has metadata for engine version and cuda version. We will update cuda dependencies if needed. |
Beta Was this translation helpful? Give feedback.
-
@vansangpfiev @namchuai @0xSage Quick responses: Per-Engine Dependencies
I also agree with @vansangpfiev: let's co-locate all CUDA dependencies with the engine folder. Simple > Complex, especially since model files are >4gb. Updating Engines
I also think we need to think through the CLI and API commands:
NamingI wonder whether it is better for us to have clearer naming for Cortex engines:
This articulates the concept of Cortex engines more clearly. Hopefully, with a clear API, the community can also step in to help build backends. We would need to reason through
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Motivation
Do we package the cuda toolkit to the engine?
Yes? Then will have to do the same for
llamacpp
,tensorrt-llm
andonnx
?No? Will download separatedly
Folder structures (e.g if user have llamacpp, tensorrt at the same time)?
Resources
Llamacpp release
Currently we are downloading toolkit dependency via
https://catalog.jan.ai/dist/cuda-dependencies/<version>/<platform>/cuda.tar.gz
cc @vansangpfiev @nguyenhoangthuan99 @dan-homebrew
Update sub-tasks:
Related
cortex engines
commands #1072Beta Was this translation helpful? Give feedback.
All reactions