You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ollama defaults to a context size of 2048 token and Goose is often exceeding that window, when this happens Ollama truncates the input. This leads to suboptimal results from the LLM.
Output from Ollama when this occurs: time=2025-02-16T13:27:35.103Z level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=3520 keep=4 new=2048
When sending the request to Ollama the context size can be specified by adding the following to the payload "options": { "num_ctx": 4096 }
The text was updated successfully, but these errors were encountered:
The recommendation (seen in the above link) is to create your own ollama model with a context size and point to it with the same api. This seems convoluted and not the right solution for this problem. A short term solution assuming you're running an ollama service locally would be to update your existing instance with the desired context window
If the above work goes through we can maybe add a helper in goose to set the environment variable, but I'll defer to the main Goose team to decide whether that's appropriate.
Ollama defaults to a context size of 2048 token and Goose is often exceeding that window, when this happens Ollama truncates the input. This leads to suboptimal results from the LLM.
Output from Ollama when this occurs:
time=2025-02-16T13:27:35.103Z level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=3520 keep=4 new=2048
When sending the request to Ollama the context size can be specified by adding the following to the payload
"options": { "num_ctx": 4096 }
The text was updated successfully, but these errors were encountered: