Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add lora schedulers - bin pack, least latency, least throughput, random #544

Merged
merged 19 commits into from
Jan 3, 2025

Conversation

Aspirin96
Copy link
Collaborator

@Aspirin96 Aspirin96 commented Dec 27, 2024

Pull Request Description

  1. implement binpack scheduler that densely deploys lora to pods
  2. implement least-latency scheduler that deploys lora to the pod with least end-to-end latency
  3. implement least-throughput scheduler that deploys lora to the pod with least request throughput
  4. implement random scheduler as baseline

Related Issues

Resolves: #305, #547

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@Aspirin96 Aspirin96 requested review from Jeffwan and brosoul December 27, 2024 07:36
@@ -87,6 +87,8 @@ var (
metrics.NumRequestsSwapped,
metrics.AvgPromptThroughputToksPerS,
metrics.AvgGenerationThroughputToksPerS,
metrics.GPUCacheUsagePerc,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to rebase the code changes once the other one get merged. Seems this part is shared

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the metrics codes are common. We'll rebase the scheduler branch after the router branch is merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@Jeffwan
Copy link
Collaborator

Jeffwan commented Dec 27, 2024

I suggest to rename the title to something more meaningful like add lora schedulers - xxx, xxx, xxx instead of using numbers

@Aspirin96 Aspirin96 changed the title [DO NOT MERGE] add 4 new lora schedulers [DO NOT MERGE] add lora schedulers - bin pack, least latency, least throughput, random Dec 30, 2024
@Aspirin96
Copy link
Collaborator Author

I suggest to rename the title to something more meaningful like add lora schedulers - xxx, xxx, xxx instead of using numbers

Very Helpful suggestions! Both PRs of routers and lora schedulers have been renamed.

Copy link
Collaborator

@brosoul brosoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jeffwan PTAL, mainly added two changes

  • Add a new MetricType named QueryLabel, and collect lora related metrics
  • refact updatePodMetrics for cyclomatic complexity

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jan 3, 2025

@brosoul Since we do use main branch at this moment, Let's merge this PR and gradually improve it?

@Aspirin96 Aspirin96 changed the title [DO NOT MERGE] add lora schedulers - bin pack, least latency, least throughput, random add lora schedulers - bin pack, least latency, least throughput, random Jan 3, 2025
@Aspirin96 Aspirin96 merged commit b479e56 into main Jan 3, 2025
10 checks passed
@Aspirin96 Aspirin96 deleted the binbin/scheduler branch January 3, 2025 02:45
@brosoul
Copy link
Collaborator

brosoul commented Jan 3, 2025

@brosoul Since we do use main branch at this moment, Let's merge this PR and gradually improve it?

lgtm

gangmuk pushed a commit that referenced this pull request Jan 25, 2025
…om (#544)

* Add random adapter scheduler

* Add leastExpectedLatency request router

* Add least latency scheduler

* Add least kv cache router

* Add bin packing scheduler (first-fit as examole)

* Add least utilization scheduler (RPM, TPM, kv_cache, busy_time as utilization)

* Add least busy time (or least gpu utilization) router

* Add weighted round robin router

* Add metrics that scheduling needed (#486)

* add scheduler metrics

* add metrics into mock app

* refact CacheUsagePerc of CPU and GPU

* add instance label into promQL

* 适配metrics接口

Change-Id: Icc2a017cb2db445fb760ced2c0034a65f9b37fa8

* add .vscode to gitignore

Change-Id: I36a0f54ca1c8a3c16b89c0077df77a119440bed3

* fix mock cpu_cache_usage_perc metrics

* feat: add least kv cache into route strategy

* rm router changes

* add 5 new schedulers

* rm least_utilization_scheduler

* style by gofmt

* rename to snake naming convention

* feat: add lora related metrics and add QueryLabel NetricType

---------

Co-authored-by: chenbinbin <[email protected]>
Co-authored-by: chenzuzhi <[email protected]>
Co-authored-by: brosoul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implements lora scheduler to better place the adapters
5 participants