OpenAI’s financials: a case study of claims vs. reality

September 18, 2024

Shortly after reports of OpenAI’s $3.4B ARR came out on June 12, FutureSearch released estimates on how much came from ChatGPT Plus/Teams/Enterprise, and the API, based on how many users.

Exactly 3 months later, on Sept 12, The Information reported a leak from OpenAI that an email from COO Brad Lightcap claimed more than 10M ChatGPT Plus subscribers, and 1M on “higher-priced plans.”

OpenAI ARR

How well did this result, from June 12 2024, hold up?

Taking this leak at face value, what did we get right and wrong in our June numbers?

  • On ChatGPT Plus paid users: Our 7.7M looks accurate, much more so than other estimates

  • On API revenue: Our ~$500M ARR looks accurate, validating our claim that it was significantly lower than was being reported in mainstream news

  • Our ChatGPT Enterprise: 1.2M was possibly accurate, probably too high

  • Out ChatGPT Teams: 900k Teams was way too high

How did we beat others on ChatGPT Plus subscribers and API revenue, and what did we get wrong on Teams? Our early track record is public, so in that spirit, here’s what we got right and wrong this time.

What we got right

First, our ChatGPT Plus paid subscriber numbers required good judgment on a few key numbers:

  • Which data points to believe

  • How to model growth since those data points

  • How to infer global subscribers given US subscribers

Here, the forecasting judgment in FutureSearch software and its human copilots performed well. What assumptions are most consistent with the (sometimes dubious) data? What reference class, e.g. “viral consumer apps” or “AI products” or “personal assistants” best describes ChatGPT Plus?

The simplest story was that (a) ratio of subscribers to free-tier is high compared to most consumer software; (b) global subscribers are substantial, and (c) growth is still rapid, if slowing down.

7.7M subscribers, at 5.2% monthly growth, gets close to the 10M number. Given the 10M number, our mistake here was thinking that growth had slowed more from the initial phase than it actually had, e.g. we thought ChatGPT Plus was further along the “s-curve” of adoption.

But other estimates we saw were much lower. We’re by far the closest of any we’ve seen.

Second, our API revenue looks robust. While Sept 12’s leak did not mention the API, by exclusion, $2.7B from non-API revenue, given $3.4B ARR as of June 12, lightly corroborates our ~$500M ARR number from June.

Notably, this is more evidence that widely reported numbers, like from The Information that API ARR hit $1B as early as March 2024, were wrong. If that were true, it’s inconceivable that total revenue was only $3.4B as of June 12 - that would imply ChatGPT Plus revenue massively slowed down from March - June, and then took off only afterward.

How did we get API revenue right? We loosely bounded the ChatGPT Enterprise, Teams, and Plus growth, and back it out from the total. While not high confidence, this was the best method available - directly estimating parameters like # of queries / day could not be bounded within even an order of magnitude. (Anecdotally, other serious estimators used the same method.)

The key judgment here was to, in an area of high uncertainty, anchor to ancillary plausible data, and reject specious numbers that undermine more credible numbers. And our “tacit judgment” corroborated it - reports of high spend from companies like TikTok did not look credible. The only scaled product then - or today! - that uses the OpenAI API is Github Copilot. 

Think of the Google Maps API, or the Stripe API, or Twilio API. Generally huge API businesses leave obvious traces of their usage in consumer and enterprise.

What we got wrong

Our big miss was on ChatGPT Team, at least taking the leak at face value. (It’s possible the leak of the email missed context - perhaps the 1M “higher-priced plans” referred only to ChatGPT Enterprise.)

We estimated 900k subscribers as of June 12. If ChatGPT Enterprise + ChatGPT Team (and edu) paying subscribers today is only “more than 1M”, e.g. certainly less than 2M and probably less than 1.5M, then likely 90%+ of them are ChatGPT Enterprise. ChatGPT Team is not taking off like we thought.

This same source, OpenAI COO Brad Lightcap, reported 600k ChatGPT Enterprise seats in April, and 150k seats in January. So under any assumptions, the growth of ChatGPT Enterprise has massively slowed, from 4x in 3 months (~60% monthly growth) to only 2x in the subsequent 5 months (15% monthly growth). And if even 10% of the reported 1M “higher-priced plans” is Teams, growth has slowed even further.

How did we get this wrong? FutureSearch looked far and wide for any credible Team subscriber data, and fell back to a forecaster “outside view”, e.g. “comp to benchmark”. For the most similar companies with Plus, Teams, and Enterprise tiers, what were their Plus:Team and Team:Enterprise ratios?

We then extrapolated Plus and Enterprise numbers backward, to when they were the same age that Teams was in June 2024 - 5 months old. We applied these 1:2 Plus:Teams, and 3:2 Enterprise:Team ratios to the subscriber count for Plus and Enterprise when they were 5 months old.

This gave 980k paying Teams users. If the true value is closer to 100k, what went wrong?

First, our benchmarking was based on very few data points. FutureSearch found almost no highly reliable Team:Plus and Enterprise:Team ratios. Our 50% confidence intervals spanned a factor of 2. 95% confidence would have been a whole order of magnitude!

Second, we neglected the inside view. FutureSearch itself pays for individual ChatGPT Plus subscriptions for our team, not the “Team” plan. Why pay $5/month more, and deal with the hassle of migrating individual subscriptions?

We could have corroborated this inside view with things like a lack of Twitter mentions of ChatGPT Team. Or one of my favorite forecasting techniques from my time running Google’s prediction market: ping ten people and ask them how many people they know who are using it. This is weak evidence, but not weaker than our other methods.

Track records

This, our public track record, and corroboration from insiders generally validates our approach and numbers. We have had misses: see for example our April 2024 forecast on the Trump immunity SCOTUS case, where we failed to put a bunch of probability mass on “partly immune and partly not immune”.

We encourage you to treat claims on the internet like we do: trust them as much as their track records warrant!