[Feat] Add runtime model management api #540

brosoul · 2024-12-23T17:33:33Z

Pull Request Description

Add runtime model management api

# start runtime server
aibrix_runtime

# we can download model from s3 by curl, like
curl --location 'http://localhost:8080/v1/model/download' \
--header 'Content-Type: application/json' \
--data '{
    "model_uri": "s3://aibricks-model-artifacts/models/linhui_tmp/gpt-neo-125m/",
    "download_extra_config": {
        "ak": "xxxx",
        "sk": "xxxx",
        "endpoint": "https://s3.us-west-2.amazonaws.com",
        "region": "us-west-2",
        "allow_file_suffix": ["json", "safetensors"]
}}'

# we can download model from tos by curl, like
curl --location 'http://localhost:8080/v1/model/download' \
--header 'Content-Type: application/json' \
--data '{
    "model_uri": "tos://aibrix-artifact-testing/models/HuggingFaceTB/SmolLM-1.7B/",
    "local_dir": "/tmp/aibrix/new_path/",
    "model_name": "brosoul",
    "download_extra_config": {
        "ak": "xxxx",
        "sk": "xxxx",
        "endpoint": "https://tos-s3-cn-beijing.volces.com",
        "region": "cn-beijing",
        "allow_file_suffix": ["json", "safetensors"]
}}'

# And we can list model that exists in the `local_dir`
curl --location --request GET 'http://localhost:8080/v1/model/list' \
--header 'Content-Type: application/json' \
--data '{"local_dir": "/tmp/aibrix/new_path/"}'

Download model
List model

Related Issues

Resolves: #196
Part of: #521

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Jeffwan · 2024-12-23T19:41:35Z

# And we can list model that exists in the `local_dir`
curl --location --request GET 'http://localhost:8080/v1/model/list' \
--header 'Content-Type: application/json' \
--data '{"local_dir": "/tmp/aibrix/new_path/"}'

this is a little bit confusing, that means the model list should retuned by searching the target local_dir?

Jeffwan · 2024-12-23T19:45:08Z

Since it's job is to manage the metadata, should it return all the models, no matter where it stored? or just apply some meaningful filters. In the last PR, different artifact store may have their own folder. what the local_dir's usage here?

Jeffwan · 2024-12-23T19:46:36Z

python/aibrix/aibrix/app.py

@@ -120,6 +124,24 @@ async def unload_lora_adapter(request: UnloadLoraAdapterRequest, raw_request: Re
    return Response(status_code=200, content=response)


+@router.post("/v1/model/download")
+async def download_model(request: DownloadModelRequest):
+    response = await ModelManager.model_download(request)


this is async call right? from client perspective, how do I know when it's finished? so I can orchestrate model loading request afterwards?

this is async call right?

I initially planned to implement this async using coroutines, but later did not follow this approach 🤣 .
However, I am wondering if it is necessary to ensure that all API interfaces are aysnc? Or it can be partially aysnc and partially sync?

how do I know when it's finished?

Keep calling the post API until the model's status returns downloaded. Because this API will directly return the model status that implemented in #539 . And if necessary, a new process will be opened in the background for downloading, it will not wait for the download to complete before returning the result

model status could be sync. model download should be async but it introduces some complexity in orchestration. this is acceptable at this moment. I will get you involved in a meeting. VKE team is integrating this part

Jeffwan · 2024-12-23T19:49:41Z

This is great! I left few comments not directly with the code, but some interface questions

brosoul · 2024-12-24T00:05:26Z

Since it's job is to manage the metadata, should it return all the models, no matter where it stored? or just apply some meaningful filters. In the last PR, different artifact store may have their own folder. what the local_dir's usage here?

local_dir and the parameter '-- local-dir' passed during download have the same meaning, referring to the root directory where the model is stored. We can store multiple model files in the root directory. The provided API (model/list) refers to the method of searching for all models in the root directory, and if the body is not passed, it will be searched from the default root directory as download command that default is envs.DOWNLOADER_LOCAL_DIR.

Jeffwan · 2024-12-24T06:08:11Z

Overall it looks good to me. Let's merge this PR now. We may need additional changes later when we start to integrate with TOS models .

* refact: add download extra config into downloader * refact: replace assert with Exception * feat: add model management api * fix: test cases * fix allow_file_suffix * fix style

brosoul added 5 commits December 24, 2024 01:10

refact: add download extra config into downloader

ae1351f

refact: replace assert with Exception

c2dfe6b

feat: add model management api

f081bfa

fix: test cases

82b6c74

fix allow_file_suffix

8e9b148

brosoul requested a review from Jeffwan December 23, 2024 17:33

fix style

48a1324

Jeffwan reviewed Dec 23, 2024

View reviewed changes

Jeffwan approved these changes Dec 24, 2024

View reviewed changes

Jeffwan merged commit d22e739 into main Dec 24, 2024
10 checks passed

Jeffwan deleted the linhui/runtime-model-mgm branch December 24, 2024 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add runtime model management api #540

[Feat] Add runtime model management api #540

brosoul commented Dec 23, 2024

Jeffwan commented Dec 23, 2024

Jeffwan commented Dec 23, 2024

Jeffwan Dec 23, 2024

brosoul Dec 24, 2024 •

edited

Loading

Jeffwan Dec 24, 2024

Jeffwan commented Dec 23, 2024

brosoul commented Dec 24, 2024

Jeffwan commented Dec 24, 2024 •

edited

Loading

[Feat] Add runtime model management api #540

[Feat] Add runtime model management api #540

Conversation

brosoul commented Dec 23, 2024

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Jeffwan commented Dec 23, 2024

Jeffwan commented Dec 23, 2024

Jeffwan Dec 23, 2024

Choose a reason for hiding this comment

brosoul Dec 24, 2024 • edited Loading

Choose a reason for hiding this comment

Jeffwan Dec 24, 2024

Choose a reason for hiding this comment

Jeffwan commented Dec 23, 2024

brosoul commented Dec 24, 2024

Jeffwan commented Dec 24, 2024 • edited Loading

brosoul Dec 24, 2024 •

edited

Loading

Jeffwan commented Dec 24, 2024 •

edited

Loading