-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Labels
Description
What happened?
Bug: Rate limiter decrements by incorrect token count for /v1/responses endpoint
Description
When using the /v1/responses endpoint with team-per-model TPM rate limiting enabled, the x-ratelimit-model_per_team-remaining-tokens header decreases by only ~2 tokens per request, regardless of actual token consumption reported in total_tokens.
Environment
- Affected: Original LiteLLM
- Not affected: Stably fork
Steps to Reproduce
Run the following curl command multiple times:
curl -sD - -o - <LITELLM_PROXY_URL>/v1/responses \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-flash-preview",
"input": "hello"
}' \
| grep -oE '"total_tokens"[[:space:]]*:[[:space:]]*[0-9]+|x-ratelimit-model_per_team-remaining-tokens:[[:space:]]*[0-9]+'Expected Behavior
x-ratelimit-model_per_team-remaining-tokens should decrease by the value reported in total_tokens (~35 tokens per request).
Actual Behavior
| Request | remaining-tokens | total_tokens | Actual Decrease |
|---|---|---|---|
| 1 | 1,999,998 | 35 | - |
| 2 | 1,999,996 | 35 | 2 |
| 3 | 1,999,994 | 35 | 2 |
The rate limiter only decrements by 2 tokens instead of the actual 35 tokens consumed.
Impact
- Rate limiting is ineffective for /v1/responses endpoint
- Users can consume significantly more tokens than their rate limit should allow
- TPM quotas are not being enforced correctly
Additional Context
- This issue is specific to the v1/responses endpoint with model_per_team TPM rate limiting
- Requires team-per-model TPM rate limiting to be configured
Relevant log output
=== Request 1 ===
x-ratelimit-model_per_team-remaining-tokens: 1999998
"total_tokens":104
=== Request 2 ===
x-ratelimit-model_per_team-remaining-tokens: 1999996
"total_tokens":104
=== Request 3 ===
x-ratelimit-model_per_team-remaining-tokens: 1999994
"total_tokens":104What part of LiteLLM is this about?
SDK (litellm Python package)
What LiteLLM version are you on ?
v1.80.11