Description
🚀 The feature, motivation and pitch
Background
Based on the lora documentation here, user has to specific the local lora path when they starts the engine. This introduces operation overhead and we want to improve the lora experience to the same level as base model. From the UX perspective, user should be able to pass in either remote lora models and local lora models. If it's a remote path, engine should be able to download it in runtime and then serve the request.
Workflow
Starts with lora model yard1/llama-2-7b-sql-lora-test
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=yard1/llama-2-7b-sql-lora-test
Current results
Expected results
Lora should be downloaded and be loaded by engine.
Proposed changes
- implement
get_lora_absolute_path
. it should hide the lora location complexity, if that's relative path, it should resolve and get an absolute path. If that's remote path, it should download the artifacts viafrom huggingface_hub import snapshot_download
and get the snapshot path. - Update the workflow here
vllm/vllm/lora/worker_manager.py
Lines 174 to 175 in 4f0e0ea
get_lora_absolute_path
before it tries to load the local path - Rename
lora_local_path
tolora_path
to indicate it supports both local and remote path now.
Future work
Support other remote storage like S3 in future. This is not in the current scope.
Alternatives
No response
Additional context
Related issue #6231. relative path doesn't work