Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat]: GPU Optimizer and Simulator development app #430

Merged
merged 45 commits into from
Nov 27, 2024
Merged

Conversation

zhangjyr
Copy link
Collaborator

@zhangjyr zhangjyr commented Nov 22, 2024

Pull Request Description

GPU optimizer provides model autoscaling capability with heterogeneous GPU support. Specifically, GPU optimizer:

  1. Dynamically discover workload patterns.
  2. Using ILP to discover cost-efficient GPU combinations that follow SLOs.
  3. Export customized metrics to guide pod scaler.

A CPU-based vLLM simulator is included for the development demo.

Related Issues

Resolves: #435

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Jingyuan Zhang added 24 commits September 26, 2024 16:07
Support next request claim.
… guide autoscaling.

Update reconciler to support metricSources.
…/autoscaler

# Conflicts:
#	pkg/controller/podautoscaler/metrics/fetcher.go
# Conflicts:
#	.gitignore
#	pkg/controller/podautoscaler/podautoscaler_controller.go
…trics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher.
…jingyuan/autoscaler

# Conflicts:
#	pkg/controller/podautoscaler/metrics/fetcher.go
Introduce meta info and version for compatibility
@nwangfw
Copy link
Collaborator

nwangfw commented Nov 25, 2024

Is this part correct? Seems that the old_cost is always equal to self.deployments[key].cost? https://github.com/aibrix/aibrix/blob/019afd9d3ef9ccb98f895c7969b2052c89b4133e/python/aibrix/aibrix/gpuoptimizer/loadmonitor/monitor.py#L150C2-L153C60
20241125-110026

kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 &
```

Add User
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we remove the necesssarity of user here. This can be skipped but this is minor. We can do some clean ups later.

return jsonify({"status": "error", "message": "Prompt and model are required"}), 400

arrived_at = datetime.now().timestamp()
input_tokens = get_token_count(prompt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the bert model tokenizer has been used here. this won't be accurate. I highly suggest to use the models' tokenizer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If speed is a concern, then we can use such estimated tokenizers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a simulator, I didn't want to import model dependency before. I can try use models' tokenizer later.

@@ -0,0 +1,2 @@
from .logging import DelayedLog as DelayedLog
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use aibrix.module. path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But .logging in init.py works.

"tputs": [[3, 2, 1], [5, 2, 1]],
"indexes: [[512, 1024], [32, 64, 128]]
}
where tputs is formulated as:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does tputs mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max throughput per input/output pair and GPU

@@ -0,0 +1,96 @@
# Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to build a solver interfaces ourselves and import necessary code instead their project? We can refer to their repo links in each files. current way is a little bit messy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if the current interface is sufficient, so no interface is defined for now.

Jingyuan Zhang and others added 3 commits November 26, 2024 12:14
Fix python package reference to start with aibrix.gpu_optimizer
Reuse aibrix/runtime image.
@Jeffwan Jeffwan changed the title [Misc]: GPU Optimizer and Simulator development app. [feat]: GPU Optimizer and Simulator development app. Nov 27, 2024
@Jeffwan Jeffwan changed the title [feat]: GPU Optimizer and Simulator development app. [feat]: GPU Optimizer and Simulator development app Nov 27, 2024
@Jeffwan Jeffwan merged commit 68ff292 into main Nov 27, 2024
13 checks passed
@Jeffwan Jeffwan deleted the jingyuan/autoscaler branch November 27, 2024 21:03
gangmuk pushed a commit that referenced this pull request Jan 25, 2025
* Integrate Vidur as a LLM simulator.

* Test deployment

* Fix debug log output
Support next request claim.

* Integrate gpu optimizer server that exposes customized pod metrics to guide autoscaling.
Update reconciler to support metricSources.

* bug fix: autoscaler with metricsSources now works.

* Decoupled workload monitoring and visualizer
Integrated visualizer

* Integrate ILP solver and profile to GPU optimizer. Aggregated traces are supported now.

* Add support for minimum replicas.

* Debugged GPU profile benchmark, generation, and loading from file.

* Add redis support for profile exchange.

* bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMetrics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher.

* bug fix

* Tuning the granularity of aggregated profile

* Adjust request trace to finer granularity.
Introduce meta info and version for compatibility

* Make request trace self-explanatory on time interval

* Apply new request trace schema.

* Add Readme.md for demo walkthrough.

* Remove model cache

* Remove TargetPort changes, which is not used.

* Fix deployment

* Organize Imports

* Python 3.9 format check

* Python 3.8 format check

* Add a40 deployment

* Bug fix

* Improve benchmark stability.

* Fix python file names.
Fix python package reference to start with aibrix.gpu_optimizer
Reuse aibrix/runtime image.

* python CI test bump to 3.10

---------

Co-authored-by: Jingyuan Zhang <[email protected]>
Co-authored-by: Ning Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC]: Cost-efficient LLM Serving with GPU Heterogeneity
3 participants