-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat]: GPU Optimizer and Simulator development app #430
Conversation
Support next request claim.
… guide autoscaling. Update reconciler to support metricSources.
…/autoscaler # Conflicts: # pkg/controller/podautoscaler/metrics/fetcher.go
Integrated visualizer
…are supported now.
# Conflicts: # .gitignore # pkg/controller/podautoscaler/podautoscaler_controller.go
…trics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher.
…jingyuan/autoscaler # Conflicts: # pkg/controller/podautoscaler/metrics/fetcher.go
Introduce meta info and version for compatibility
Is this part correct? Seems that the old_cost is always equal to self.deployments[key].cost? https://github.com/aibrix/aibrix/blob/019afd9d3ef9ccb98f895c7969b2052c89b4133e/python/aibrix/aibrix/gpuoptimizer/loadmonitor/monitor.py#L150C2-L153C60 |
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 & | ||
``` | ||
|
||
Add User |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we remove the necesssarity of user here. This can be skipped but this is minor. We can do some clean ups later.
return jsonify({"status": "error", "message": "Prompt and model are required"}), 400 | ||
|
||
arrived_at = datetime.now().timestamp() | ||
input_tokens = get_token_count(prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the bert model tokenizer has been used here. this won't be accurate. I highly suggest to use the models' tokenizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If speed is a concern, then we can use such estimated tokenizers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's a simulator, I didn't want to import model dependency before. I can try use models' tokenizer later.
@@ -0,0 +1,2 @@ | |||
from .logging import DelayedLog as DelayedLog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use aibrix.module.
path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. But .logging in init.py works.
"tputs": [[3, 2, 1], [5, 2, 1]], | ||
"indexes: [[512, 1024], [32, 64, 128]] | ||
} | ||
where tputs is formulated as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does tputs mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max throughput per input/output pair and GPU
python/aibrix/aibrix/gpuoptimizer/optimizer/solver/melange/requirements.txt
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,96 @@ | |||
# Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to build a solver interfaces ourselves and import necessary code instead their project? We can refer to their repo links in each files. current way is a little bit messy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if the current interface is sufficient, so no interface is defined for now.
Fix python package reference to start with aibrix.gpu_optimizer Reuse aibrix/runtime image.
* Integrate Vidur as a LLM simulator. * Test deployment * Fix debug log output Support next request claim. * Integrate gpu optimizer server that exposes customized pod metrics to guide autoscaling. Update reconciler to support metricSources. * bug fix: autoscaler with metricsSources now works. * Decoupled workload monitoring and visualizer Integrated visualizer * Integrate ILP solver and profile to GPU optimizer. Aggregated traces are supported now. * Add support for minimum replicas. * Debugged GPU profile benchmark, generation, and loading from file. * Add redis support for profile exchange. * bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMetrics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher. * bug fix * Tuning the granularity of aggregated profile * Adjust request trace to finer granularity. Introduce meta info and version for compatibility * Make request trace self-explanatory on time interval * Apply new request trace schema. * Add Readme.md for demo walkthrough. * Remove model cache * Remove TargetPort changes, which is not used. * Fix deployment * Organize Imports * Python 3.9 format check * Python 3.8 format check * Add a40 deployment * Bug fix * Improve benchmark stability. * Fix python file names. Fix python package reference to start with aibrix.gpu_optimizer Reuse aibrix/runtime image. * python CI test bump to 3.10 --------- Co-authored-by: Jingyuan Zhang <[email protected]> Co-authored-by: Ning Wang <[email protected]>
Pull Request Description
GPU optimizer provides model autoscaling capability with heterogeneous GPU support. Specifically, GPU optimizer:
A CPU-based vLLM simulator is included for the development demo.
Related Issues
Resolves: #435
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.