Skip to content

[Feature]: Support loading lora adapters from HuggingFace in runtime #6233

Closed
@Jeffwan

Description

@Jeffwan

🚀 The feature, motivation and pitch

Background

Based on the lora documentation here, user has to specific the local lora path when they starts the engine. This introduces operation overhead and we want to improve the lora experience to the same level as base model. From the UX perspective, user should be able to pass in either remote lora models and local lora models. If it's a remote path, engine should be able to download it in runtime and then serve the request.

Workflow

Starts with lora model yard1/llama-2-7b-sql-lora-test

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-2-7b-hf \
    --enable-lora \
    --lora-modules sql-lora=yard1/llama-2-7b-sql-lora-test

Current results

image

Expected results

Lora should be downloaded and be loaded by engine.

Proposed changes

  1. implement get_lora_absolute_path. it should hide the lora location complexity, if that's relative path, it should resolve and get an absolute path. If that's remote path, it should download the artifacts via from huggingface_hub import snapshot_download and get the snapshot path.
  2. Update the workflow here
    expected_lora_modules.append(module)
    lora = self._lora_model_cls.from_local_checkpoint(
    to leverage the get_lora_absolute_path before it tries to load the local path
  3. Rename lora_local_path to lora_path to indicate it supports both local and remote path now.

Future work

Support other remote storage like S3 in future. This is not in the current scope.

Alternatives

No response

Additional context

Related issue #6231. relative path doesn't work

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions