Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[router] Use string instead of token ids #673

Open
gaocegege opened this issue Feb 14, 2025 · 2 comments
Open

[router] Use string instead of token ids #673

gaocegege opened this issue Feb 14, 2025 · 2 comments
Assignees
Labels
area/gateway kind/enhancement New feature or request priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@gaocegege
Copy link
Collaborator

gaocegege commented Feb 14, 2025

Ref #641 (comment)

Currently, we use token IDs to support prefix cache-aware routing, which requires encoding first. This introduces several microseconds of latency to the requests, and the benefits aren't substantial.

In Q&A scenarios, the input string could be equivalent to token IDs. For RAG scenarios, we might benefit from something like CacheBlend for better performance rather than solely relying on token IDs.

Therefore, I propose using strings in the router.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 14, 2025

Introducing tokenization brings some complexity on tokenizer managed (if we want every model uses their own tokenizer) as well. We need to consider the benefits and at least make this part configurable now.

@Jeffwan Jeffwan added kind/enhancement New feature or request area/gateway priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Feb 14, 2025
@gangmuk
Copy link
Collaborator

gangmuk commented Feb 20, 2025

Cross posting my comments here for future consideration


We can make it pluggable. It should be system-wide variable which shouldn't be changed during the runtime. Otherwise, it will mess up all the cache.

I think TokenizeInputText shouldn't be in each routing algorithm implementation. Currently, it is done in each Route function. It can be decoupled and done in common execution path (somewhere in gateway.go) before the Route.

Tokenization itself has two minor issues

overhead (not sure before testing)
debugging with the raw text is easier than looking at token ids. (so I used Detokenization on my side when I debugged my routing implementation.
I am not sure these are critical enough to support different ways of input embedding (raw string, tokenization method 1, tokenization method 2, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway kind/enhancement New feature or request priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

4 participants