Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Add runtime model management api #540

Merged
merged 6 commits into from
Dec 24, 2024
Merged

Conversation

brosoul
Copy link
Collaborator

@brosoul brosoul commented Dec 23, 2024

Pull Request Description

Add runtime model management api

# start runtime server
aibrix_runtime

# we can download model from s3 by curl, like
curl --location 'http://localhost:8080/v1/model/download' \
--header 'Content-Type: application/json' \
--data '{
    "model_uri": "s3://aibricks-model-artifacts/models/linhui_tmp/gpt-neo-125m/",
    "download_extra_config": {
        "ak": "xxxx",
        "sk": "xxxx",
        "endpoint": "https://s3.us-west-2.amazonaws.com",
        "region": "us-west-2",
        "allow_file_suffix": ["json", "safetensors"]
}}'

# we can download model from tos by curl, like
curl --location 'http://localhost:8080/v1/model/download' \
--header 'Content-Type: application/json' \
--data '{
    "model_uri": "tos://aibrix-artifact-testing/models/HuggingFaceTB/SmolLM-1.7B/",
    "local_dir": "/tmp/aibrix/new_path/",
    "model_name": "brosoul",
    "download_extra_config": {
        "ak": "xxxx",
        "sk": "xxxx",
        "endpoint": "https://tos-s3-cn-beijing.volces.com",
        "region": "cn-beijing",
        "allow_file_suffix": ["json", "safetensors"]
}}'

# And we can list model that exists in the `local_dir`
curl --location --request GET 'http://localhost:8080/v1/model/list' \
--header 'Content-Type: application/json' \
--data '{"local_dir": "/tmp/aibrix/new_path/"}'
  • Download model
    image

  • List model
    image

Related Issues

Resolves: #196
Part of: #521

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@brosoul brosoul requested a review from Jeffwan December 23, 2024 17:33
@Jeffwan
Copy link
Collaborator

Jeffwan commented Dec 23, 2024

# And we can list model that exists in the `local_dir`
curl --location --request GET 'http://localhost:8080/v1/model/list' \
--header 'Content-Type: application/json' \
--data '{"local_dir": "/tmp/aibrix/new_path/"}'

this is a little bit confusing, that means the model list should retuned by searching the target local_dir?

@Jeffwan
Copy link
Collaborator

Jeffwan commented Dec 23, 2024

Since it's job is to manage the metadata, should it return all the models, no matter where it stored? or just apply some meaningful filters. In the last PR, different artifact store may have their own folder. what the local_dir's usage here?

@@ -120,6 +124,24 @@ async def unload_lora_adapter(request: UnloadLoraAdapterRequest, raw_request: Re
return Response(status_code=200, content=response)


@router.post("/v1/model/download")
async def download_model(request: DownloadModelRequest):
response = await ModelManager.model_download(request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is async call right? from client perspective, how do I know when it's finished? so I can orchestrate model loading request afterwards?

Copy link
Collaborator Author

@brosoul brosoul Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is async call right?

I initially planned to implement this async using coroutines, but later did not follow this approach 🤣 .
However, I am wondering if it is necessary to ensure that all API interfaces are aysnc? Or it can be partially aysnc and partially sync?

how do I know when it's finished?

Keep calling the post API until the model's status returns downloaded. Because this API will directly return the model status that implemented in #539 . And if necessary, a new process will be opened in the background for downloading, it will not wait for the download to complete before returning the result

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model status could be sync. model download should be async but it introduces some complexity in orchestration. this is acceptable at this moment. I will get you involved in a meeting. VKE team is integrating this part

@Jeffwan
Copy link
Collaborator

Jeffwan commented Dec 23, 2024

This is great! I left few comments not directly with the code, but some interface questions

@brosoul
Copy link
Collaborator Author

brosoul commented Dec 24, 2024

Since it's job is to manage the metadata, should it return all the models, no matter where it stored? or just apply some meaningful filters. In the last PR, different artifact store may have their own folder. what the local_dir's usage here?

local_dir and the parameter '-- local-dir' passed during download have the same meaning, referring to the root directory where the model is stored. We can store multiple model files in the root directory. The provided API (model/list) refers to the method of searching for all models in the root directory, and if the body is not passed, it will be searched from the default root directory as download command that default is envs.DOWNLOADER_LOCAL_DIR.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Dec 24, 2024

Overall it looks good to me. Let's merge this PR now. We may need additional changes later when we start to integrate with TOS models .

@Jeffwan Jeffwan merged commit d22e739 into main Dec 24, 2024
10 checks passed
@Jeffwan Jeffwan deleted the linhui/runtime-model-mgm branch December 24, 2024 06:08
gangmuk pushed a commit that referenced this pull request Jan 25, 2025
* refact: add download extra config into downloader

* refact: replace assert with Exception

* feat: add model management api

* fix: test cases

* fix allow_file_suffix

* fix style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support model adapter download in AI runtime
2 participants