-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add lora schedulers - bin pack, least latency, least throughput, random #544
Conversation
* add scheduler metrics * add metrics into mock app * refact CacheUsagePerc of CPU and GPU * add instance label into promQL
Change-Id: Icc2a017cb2db445fb760ced2c0034a65f9b37fa8
Change-Id: I36a0f54ca1c8a3c16b89c0077df77a119440bed3
@@ -87,6 +87,8 @@ var ( | |||
metrics.NumRequestsSwapped, | |||
metrics.AvgPromptThroughputToksPerS, | |||
metrics.AvgGenerationThroughputToksPerS, | |||
metrics.GPUCacheUsagePerc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to rebase the code changes once the other one get merged. Seems this part is shared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the metrics codes are common. We'll rebase the scheduler branch after the router branch is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
I suggest to rename the title to something more meaningful like |
Very Helpful suggestions! Both PRs of routers and lora schedulers have been renamed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jeffwan PTAL, mainly added two changes
- Add a new MetricType named
QueryLabel
, and collect lora related metrics - refact
updatePodMetrics
for cyclomatic complexity
@brosoul Since we do use main branch at this moment, Let's merge this PR and gradually improve it? |
lgtm |
…om (#544) * Add random adapter scheduler * Add leastExpectedLatency request router * Add least latency scheduler * Add least kv cache router * Add bin packing scheduler (first-fit as examole) * Add least utilization scheduler (RPM, TPM, kv_cache, busy_time as utilization) * Add least busy time (or least gpu utilization) router * Add weighted round robin router * Add metrics that scheduling needed (#486) * add scheduler metrics * add metrics into mock app * refact CacheUsagePerc of CPU and GPU * add instance label into promQL * 适配metrics接口 Change-Id: Icc2a017cb2db445fb760ced2c0034a65f9b37fa8 * add .vscode to gitignore Change-Id: I36a0f54ca1c8a3c16b89c0077df77a119440bed3 * fix mock cpu_cache_usage_perc metrics * feat: add least kv cache into route strategy * rm router changes * add 5 new schedulers * rm least_utilization_scheduler * style by gofmt * rename to snake naming convention * feat: add lora related metrics and add QueryLabel NetricType --------- Co-authored-by: chenbinbin <[email protected]> Co-authored-by: chenzuzhi <[email protected]> Co-authored-by: brosoul <[email protected]>
Pull Request Description
Related Issues
Resolves: #305, #547
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.