[feat]: GPU Optimizer and Simulator development app #430

zhangjyr · 2024-11-22T05:58:37Z

Pull Request Description

GPU optimizer provides model autoscaling capability with heterogeneous GPU support. Specifically, GPU optimizer:

Dynamically discover workload patterns.
Using ILP to discover cost-efficient GPU combinations that follow SLOs.
Export customized metrics to guide pod scaler.

A CPU-based vLLM simulator is included for the development demo.

Related Issues

Resolves: #435

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

…/autoscaler

Support next request claim.

…/autoscaler

… guide autoscaling. Update reconciler to support metricSources.

…/autoscaler # Conflicts: # pkg/controller/podautoscaler/metrics/fetcher.go

Integrated visualizer

…are supported now.

# Conflicts: # .gitignore # pkg/controller/podautoscaler/podautoscaler_controller.go

…trics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher.

…jingyuan/autoscaler # Conflicts: # pkg/controller/podautoscaler/metrics/fetcher.go

Introduce meta info and version for compatibility

…aler

nwangfw · 2024-11-25T19:01:08Z

Is this part correct? Seems that the old_cost is always equal to self.deployments[key].cost? https://github.com/aibrix/aibrix/blob/019afd9d3ef9ccb98f895c7969b2052c89b4133e/python/aibrix/aibrix/gpuoptimizer/loadmonitor/monitor.py#L150C2-L153C60

docs/development/simulator/Dockerfile-a40

Jeffwan · 2024-11-26T17:57:58Z

docs/development/simulator/README.md

+kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 &
+```
+
+Add User


nit: we remove the necesssarity of user here. This can be skipped but this is minor. We can do some clean ups later.

docs/development/simulator/README.md

Jeffwan · 2024-11-26T18:15:52Z

docs/development/simulator/app.py

+            return jsonify({"status": "error", "message": "Prompt and model are required"}), 400
+
+        arrived_at = datetime.now().timestamp()
+        input_tokens = get_token_count(prompt)


seems the bert model tokenizer has been used here. this won't be accurate. I highly suggest to use the models' tokenizer.

If speed is a concern, then we can use such estimated tokenizers

Because it's a simulator, I didn't want to import model dependency before. I can try use models' tokenizer later.

python/aibrix/aibrix/gpuoptimizer/watch.py

Jeffwan · 2024-11-26T18:50:19Z

python/aibrix/aibrix/gpuoptimizer/utils/__init__.py

@@ -0,0 +1,2 @@
+from .logging import DelayedLog as DelayedLog


let's use aibrix.module. path

Done. But .logging in init.py works.

Jeffwan · 2024-11-26T18:51:26Z

python/aibrix/aibrix/gpuoptimizer/optimizer/types.py

+        "tputs": [[3, 2, 1], [5, 2, 1]],
+        "indexes: [[512, 1024], [32, 64, 128]]
+    }
+    where tputs is formulated as:


what does tputs mean?

The max throughput per input/output pair and GPU

python/aibrix/aibrix/gpuoptimizer/optimizer/solver/melange/requirements.txt

Jeffwan · 2024-11-26T18:55:17Z

python/aibrix/aibrix/gpuoptimizer/optimizer/solver/melange/README.md

@@ -0,0 +1,96 @@
+# Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity


Is there a need to build a solver interfaces ourselves and import necessary code instead their project? We can refer to their repo links in each files. current way is a little bit messy

I am unsure if the current interface is sufficient, so no interface is defined for now.

Fix python package reference to start with aibrix.gpu_optimizer Reuse aibrix/runtime image.

* Integrate Vidur as a LLM simulator. * Test deployment * Fix debug log output Support next request claim. * Integrate gpu optimizer server that exposes customized pod metrics to guide autoscaling. Update reconciler to support metricSources. * bug fix: autoscaler with metricsSources now works. * Decoupled workload monitoring and visualizer Integrated visualizer * Integrate ILP solver and profile to GPU optimizer. Aggregated traces are supported now. * Add support for minimum replicas. * Debugged GPU profile benchmark, generation, and loading from file. * Add redis support for profile exchange. * bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMetrics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher. * bug fix * Tuning the granularity of aggregated profile * Adjust request trace to finer granularity. Introduce meta info and version for compatibility * Make request trace self-explanatory on time interval * Apply new request trace schema. * Add Readme.md for demo walkthrough. * Remove model cache * Remove TargetPort changes, which is not used. * Fix deployment * Organize Imports * Python 3.9 format check * Python 3.8 format check * Add a40 deployment * Bug fix * Improve benchmark stability. * Fix python file names. Fix python package reference to start with aibrix.gpu_optimizer Reuse aibrix/runtime image. * python CI test bump to 3.10 --------- Co-authored-by: Jingyuan Zhang <[email protected]> Co-authored-by: Ning Wang <[email protected]>

Jingyuan Zhang added 24 commits September 26, 2024 16:07

Integrate Vidur as a LLM simulator.

c3e1f94

Merge commit 'main' into jingyuan/autoscaler

52d2ccb

Test deployment

92f141d

Merge commit 'f9702c3d8da7ab4d0e6fd8c5877c1e57ce528fb9' into jingyuan…

b2a914f

…/autoscaler

Fix debug log output

d75cbb8

Support next request claim.

Merge commit '5d8d8439077b08b12fd3de62ad7c4e6e2fb6e4ed' into jingyuan…

45d2801

…/autoscaler

Integrate gpu optimizer server that exposes customized pod metrics to…

702346f

… guide autoscaling. Update reconciler to support metricSources.

Merge commit 'ea5dc7784fa767fcf40b61417627ab47b6dba426' into jingyuan…

c1b317c

…/autoscaler # Conflicts: # pkg/controller/podautoscaler/metrics/fetcher.go

bug fix: autoscaler with metricsSources now works.

2ce7c10

Decoupled workload monitoring and visualizer

4d811c4

Integrated visualizer

Integrate ILP solver and profile to GPU optimizer. Aggregated traces …

e64052b

…are supported now.

Add support for minimum replicas.

6066cc0

Merge branch 'main' into jingyuan/autoscaler

9951e11

# Conflicts: # .gitignore # pkg/controller/podautoscaler/podautoscaler_controller.go

Debugged GPU profile benchmark, generation, and loading from file.

9abf0d4

Add redis support for profile exchange.

6ce6e87

bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMe…

22e94c5

…trics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher.

bug fix

9604ac0

Merge branch 'issues/408_Duplicated_http_in_RestMetricsFetcher' into …

5803c6d

…jingyuan/autoscaler # Conflicts: # pkg/controller/podautoscaler/metrics/fetcher.go

Tuning the granularity of aggregated profile

3d7d28a

Adjust request trace to finer granularity.

c4b5a64

Introduce meta info and version for compatibility

Make request trace self-explanatory on time interval

74491b4

Merge branch 'jingyuan/finer_profile_granuality' into jingyuan/autosc…

6fe9c5f

…aler

Apply new request trace schema.

e7f5e2f

Add Readme.md for demo walkthrough.

dba5205

zhangjyr added area/autoscaling area/heterogeneous labels Nov 22, 2024

zhangjyr added this to the v0.2.0 milestone Nov 22, 2024

zhangjyr requested review from Jeffwan, kr11 and varungup90 November 22, 2024 05:58

Jingyuan Zhang added 15 commits November 22, 2024 10:45

Merge branch 'jingyuan/finer_profile_granuality' into jingyuan/autosc…

d39fdcb

…aler

Remove model cache

b828e51

Remove TargetPort changes, which is not used.

984feb1

Fix deployment

60a9364

Organize Imports

548ab63

Lint fix

5afab5c

Lint fix

d43162a

ruff reformat

715ab4b

Passed mypy

599f820

Lint fix

6c357a6

pass mypy for redis

d05a087

ruff again

8c835c8

Pass lint

d77cd5a

Python 3.9 format check

009b341

Python 3.8 format check

019afd9

Jingyuan Zhang added 2 commits November 25, 2024 13:58

Add a40 deployment

b609c33

Bug fix

6e84e6f

Jeffwan reviewed Nov 26, 2024

View reviewed changes

Jingyuan Zhang and others added 3 commits November 26, 2024 12:14

Improve benchmark stability.

a573d44

Fix python file names.

69c4b47

Fix python package reference to start with aibrix.gpu_optimizer Reuse aibrix/runtime image.

python CI test bump to 3.10

469955b

Jeffwan changed the title ~~[Misc]: GPU Optimizer and Simulator development app.~~ [feat]: GPU Optimizer and Simulator development app. Nov 27, 2024

Jeffwan changed the title ~~[feat]: GPU Optimizer and Simulator development app.~~ [feat]: GPU Optimizer and Simulator development app Nov 27, 2024

Jeffwan merged commit 68ff292 into main Nov 27, 2024
13 checks passed

Jeffwan deleted the jingyuan/autoscaler branch November 27, 2024 21:03

Jeffwan mentioned this pull request Dec 2, 2024

Consolidate mocked app and simulator into one unified mock app for development #456

Closed

This was referenced Dec 2, 2024

Run small models with vLLM CPU mode for local development testing #417

Open

[Misc] Consolidate app and simulator #477

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat]: GPU Optimizer and Simulator development app #430

[feat]: GPU Optimizer and Simulator development app #430

zhangjyr commented Nov 22, 2024 •

edited by Jeffwan

Loading

nwangfw commented Nov 25, 2024 •

edited

Loading

Jeffwan Nov 26, 2024

Jeffwan Nov 26, 2024

Jeffwan Nov 26, 2024

zhangjyr Nov 27, 2024

Jeffwan Nov 26, 2024

zhangjyr Nov 27, 2024

Jeffwan Nov 26, 2024

zhangjyr Nov 27, 2024

Jeffwan Nov 26, 2024

zhangjyr Nov 27, 2024

		@@ -0,0 +1,2 @@
		from .logging import DelayedLog as DelayedLog

		@@ -0,0 +1,96 @@
		# Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

[feat]: GPU Optimizer and Simulator development app #430

[feat]: GPU Optimizer and Simulator development app #430

Conversation

zhangjyr commented Nov 22, 2024 • edited by Jeffwan Loading

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

nwangfw commented Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangjyr commented Nov 22, 2024 •

edited by Jeffwan

Loading

nwangfw commented Nov 25, 2024 •

edited

Loading