LiteLLM made us accidentally make our product free for a week

For six and a half days this June, our usage billing stopped charging anyone. Our agents ran normally, our billing system checked balances, providers carried on charging us, but the deduction at the end never happened. This slipped through our alert systems, and one of our engineers noticed it by accident.

FutureSearch is our research-agent platform. Through it, users run conversations with an agent in the app, and from there (or from our Python SDK or our Claude integrations) they schedule tasks: forecasters, agents, multi-agents, and more. Users have a balance, topped up directly or granted via subscription, and as their research runs, we deduct from it.

To attribute each model call back to the right user, we attach a small bundle of metadata to every call: account_id=<uuid> to point at the right balance, conversation=<uuid> for app conversations, and task=<uuid> for scheduled agent tasks. We use LiteLLM for the routing, fallback, retry, and accounting in front of those calls; our billing pipeline assumes LiteLLM persists those tags, exactly as we sent them, into the request_tags JSONB column of its LiteLLM_SpendLogs table. Thirty seconds after each task finishes, the spend is computed by a worker running:

SELECT SUM(spend)
FROM "LiteLLM_SpendLogs"
WHERE "startTime" >= NOW() - INTERVAL '3 hours'
  AND request_tags @> '["task=<task_id>"]'::jsonb;

The resulting spend is multiplied by the user's tier markup, and we deduct that many cents from their balance. If the query returns nothing, the worker logs no cost data found and returns.

On 2026-06-04 at 09:30 UTC, we shipped a LiteLLM upgrade from v1.80.5.rc.2 to v1.83.14-stable.patch.3, to pick up a fix in the OpenAI Responses parallel-tool-call bridge. Staging and production tests passed, confirming the parallel tool call fix worked. We moved on.

We didn't notice that the upgrade included a security feature called allow_client_tags, introduced in v1.83.10 (released 2026-04-27, five weeks before our deploy):

# Strip caller-supplied routing/budget tags unless the admin has opted
# this key or team in via metadata.allow_client_tags=True. Tags drive
# tag-based routing and tag budget attribution. Accepting them from
# untrusted callers lets an attacker reach restricted deployments or
# misattribute spend to a victim team's tag.
_admin_allow_client_tags = False
for _admin_meta in (
    user_api_key_dict.metadata,
    user_api_key_dict.team_metadata,
):
    if isinstance(_admin_meta, dict) and _admin_meta.get("allow_client_tags") is True:
        _admin_allow_client_tags = True
        break
if not _admin_allow_client_tags:
    for _meta_key in ("metadata", "litellm_metadata"):
        _user_meta = data.get(_meta_key)
        if isinstance(_user_meta, dict) and "tags" in _user_meta:
            _user_meta.pop("tags", None)
    if "tags" in data:
        data.pop("tags", None)

Starting in v1.83.10, the proxy strips any tags the caller sends (from metadata.tags, litellm_metadata.tags, or a root-level tags field) unless an admin has set allow_client_tags: true on the API key or team metadata. The request still returns 200 with no caller-side signal. There is a WARNING log line on each strip and a clear "Breaking changes" entry in the v1.83.10 release notes. Each of these slipped under our radar.

Our keys weren't opted in, so the proxy stripped our tags. The LLM calls still went through and we were still billed by the providers; our LiteLLM_SpendLogs rows now carried only request_tags = ["User-Agent: python-httpx"], the User-Agent string being one LiteLLM derives server-side after the caller-tag strip. The billing worker logged no cost data found on every task and exited without deducting anything.

We caught it because one of our engineers was tracking the cost of an experiment they had just run, opened their go-to Grafana cost dashboard, and got nothing back. They assumed the dashboard was broken. We found that it was working fine, and was correctly reporting zero recorded spend. From our internal billing ledger:

Day	Usage deductions	Cents deducted
2026-06-03 (day before the bug shipped)	265	-$1,189.44
2026-06-04 (day the bug shipped)	103	-$47.96
2026-06-05	2	-$1.00
2026-06-06, 07, 08	0	(no rows at all)
2026-06-09	1	$0.00

And from LiteLLM, for the last twenty-four hours before we found the bug:

74,636 LLM requests served.
$1,507.95 of raw spend incurred.
0 of those rows carrying any of the metadata we'd attached to them.

To fix this, we've introduced the monitoring that we should have had at the start. A Grafana alert now fires when one hour of tagged spend in LiteLLM_SpendLogs falls below $5, and the billing worker now raises a Sentry exception whenever a completed task's spend query comes back empty. Both went in on the day we found the bug, while LiteLLM was still stripping tags, so we could verify them against the live broken state. We bumped LiteLLM to v1.86.4; upstream had already removed the gate in commit eb142b90 on 2026-05-12, but the fix wasn't backported to the v1.83 line we'd pinned. And we tightened the re-notify interval on critical cost alerts from eight hours to one.

Billing started charging again at 18:23 UTC on 2026-06-10, when the first tagged row landed in LiteLLM_SpendLogs after the new version rolled out.

Plenty of this is on us. Every alert in our cost stack watched for "this number is going up too fast," and none watched for "this number went to zero." That asymmetry is easy to fall into when overcharging is the failure you've been worrying about. The billing worker made this worse by logging no cost data found and exiting cleanly instead of complaining loudly about it. Also, we bumped a major-version dependency without paying close enough attention to behaviour changes to the part of LiteLLM our billing leans on.

We also picked the wrong version. The upstream fix had already been sitting in mainline since 2026-05-12, three weeks before our deployment, but instead we went with an older version. We tend to be cautious about LiteLLM upgrades in a way we aren't about the rest of our stack. This is partly due to past behaviour drift in the proxy, but also due to being stung by the malicious PyPI release one of our engineers first flagged in March. Sitting a release or two behind had become habit, and this time it kept us from looking at the release that had already fixed the issue.

Still, the upstream change is the bigger problem. request_tags is the feature LiteLLM advertises for cost attribution, comprising tag-based routing, tag budgets, spend-by-tag dashboards, per-tag spend reports. A default-on behaviour change that drops its data with no error is a footgun, especially when the opt-in is easy to miss. In an area as critical as billing, it was a disaster. That it landed at v1.83.10, a patch on the active 1.83 line where operators don't expect breaking changes, only made it harder to catch. To BerriAI's credit, the gate was removed about two months later, which is timely enough.

If you ran tasks or chatted with our agent between June 4 and June 10, the LLM cost of your work landed on us. We're not going back to bill any of it. And we've emailed affected users directly. If you got more done than you expected last week, you're welcome.

What's hard to ignore is that this is the third LiteLLM-related thing we've written about this year. The previous two were the PyPI supply chain attack in March, and the follow-up analysis of who was exposed.