Some rare examples of AIs being underconfident

The whole "hallucination" thing makes me think people generally see LLMs as overconfident. Earlier models especially were notorious for concluding things on little (or no) evidence and then presenting it as confident truth, most notoriously in legal cases or government reports.

But what about underconfidence? Pressure in RLHF to not be overconfident could plausibly lead to LLMs being unwilling to state justified good conclusions.

I found some evidence of underconfidence in our recent forecasting benchmark (huggingface, leaderboard), which I found conspicuous given how I normally think about LLMs.

For example, one of the questions in the BTF-2 benchmark asked a Claude Opus 4.6 agent, in Oct 2025, whether the 2025 NYC mayoral general election would exceed 1.3 million total ballots. Claude found good evidence and did the math correctly. The primary had already drawn 1.1 million ballots. The historical primary-to-general ratio was 1.22. That gives 1.34 million general-election ballots, above the threshold.

Opus wrote that calculation in its rationale, looked at it, called it "unstable across cycles," and gave a final forecast 25%. The actual general-election turnout was >2.0M, clearing the threshold by >1.5x. To me it looks like the bad score Opus got on this question was because it chickened out of the conclusion it had correctly worked out.

The agent computed the right answer and then walked away from it

I started hunting around for other examples of this in the dataset. That example came from expert human forecasters auditing 130 an Opus 4.6 agent's worst calls on BTF-2. It's almost as if the LLMs were following the old forecasting adage, "things don't happen". (You may have seen Polymarket portfolios entirely built out of "NO" bets that are profitable.) But it hurt scores in BTF-2.

We found this specifically in Opus 4.6, though we couldn't be sure it didn't also exist in GPT-5.4 and Gemini-3.1-pro agents. Basically: agent does good research, derives the right pathway, names the correct precedent, and then assigns a probability that contradicts its own analysis because it thinks its conclusion is too extreme.

A few other examples from the audit:

Opus was asked whether the UNSC would adopt a ceasefire resolution without a US veto by December 31. It named the right pathway in its rationale ("US sponsors a resolution endorsing its own peace plan, like 2735"), cited the 2735 precedent, and noted Russia's public support for Trump's 20-point plan. Then it gave an 8% chance. On November 17, Resolution 2803 was adopted 13-0-2 through exactly that pathway.

On the Argentine peso, Opus gave an 85% chance that the BCRA rate would depreciate at least 8% by year-end, which did not happen (it was 5%). In its forecast, the depreciation case got seven sourced paragraphs; the election-reversal case, what actually happened, got one bullet. The midterm election was eleven days away with Milei leading. LLA won on October 26, the peso rallied roughly 10% the next day, and the year-end depreciation was about 5%.

On US-Venezuela talks, Opus was asked whether either government would confirm direct bilateral contact by December 31. It found the October 6 diplomatic cutoff, named the reversal pathway, even acknowledged that "escalation itself creates pressure for diplomatic engagement," then gave a 10% chance. Trump called Maduro on November 21. (More on this in how AI takes stated positions as durable commitments.)

It's striking to see Claude not really believe its own reasoning. Maybe this is working as intended? It could be a safety feature, ensuring that Claude doesn't go off the rails when the evidence it gets about what's going on is surprising or unusual. But this makes it worse as forecaster, and could bite you when you're asking for advice about decisions and scenarios you might face.

I think it's worth watching for in any setting where the analysis and the bottom-line conclusion diverge. When they disagree, sometimes the analysis is better than the conclusion. (Which is annoying as it's easy to skip the lengthy analysis and read the conclusion.)

Also see: the paper, AI takes people at their word, agents sometimes catastrophize, and the effort paradox.