[Feature]: Support loading lora adapters from HuggingFace in runtime

### 🚀 The feature, motivation and pitch

### Background
Based on the lora documentation [here](https://docs.vllm.ai/en/latest/models/lora.html), user has to specific the local lora path when they starts the engine. This introduces operation overhead and we want to improve the lora experience to the same level as base model. From the UX perspective, user should be able to pass in either remote lora models and local lora models. If it's a remote path, engine should be able to download it in runtime and then serve the request. 

### Workflow

Starts with lora model [yard1/llama-2-7b-sql-lora-test](https://huggingface.co/yard1/llama-2-7b-sql-lora-test)

```
python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-2-7b-hf \
    --enable-lora \
    --lora-modules sql-lora=yard1/llama-2-7b-sql-lora-test
```

### Current results
![image](https://github.com/vllm-project/vllm/assets/4739316/a0bed0df-c74f-46fa-98e6-95d7f0fc87d6)

### Expected results
Lora should be downloaded and be loaded by engine.

### Proposed changes
1. implement `get_lora_absolute_path`. it should hide the lora location complexity, if that's relative path, it should resolve and get an absolute path. If that's remote path, it should download the artifacts via `from huggingface_hub import snapshot_download` and get the snapshot path.
2. Update the workflow here https://github.com/vllm-project/vllm/blob/4f0e0ea131ef40654faa26fa21196031754df53a/vllm/lora/worker_manager.py#L174-L175 to leverage the `get_lora_absolute_path` before it tries to load the local path
3. Rename `lora_local_path` to `lora_path` to indicate it supports both local and remote path now. 

### Future work

Support other remote storage like S3 in future. This is not in the current scope.




### Alternatives

_No response_

### Additional context

Related issue https://github.com/vllm-project/vllm/issues/6231.  relative path doesn't work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Support loading lora adapters from HuggingFace in runtime #6233

🚀 The feature, motivation and pitch

Background

Workflow

Current results

Expected results

Proposed changes

Future work

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	expected_lora_modules.append(module)
	lora = self._lora_model_cls.from_local_checkpoint(

Uh oh!

[Feature]: Support loading lora adapters from HuggingFace in runtime #6233

Description

🚀 The feature, motivation and pitch

Background

Workflow

Current results

Expected results

Proposed changes

Future work

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions