Research

Publications

arXiv:2604.26106

Evaluating Strategic Reasoning in Forecasting Agents

Tom Liptay, Dan Schwarz, Rafael Poyiadzi, Jack Wildman, Nikos I. Bosse

April 2026 arXiv:2601.22444

Automating Forecasting Question Generation and Resolution for AI Evaluation

Nikos I. Bosse, Peter Mühlbacher, Jack Wildman, Lawrence Phillips, Dan Schwarz

January 2026 arXiv:2506.21558

Bench to the Future: A Pastcasting Benchmark for Forecasting Agents

Jack Wildman, Nikos I. Bosse, Daniel Hnyk, Peter Mühlbacher, Finn Hambly, Jon Evans, Dan Schwarz, Lawrence Phillips

June 2025 arXiv:2506.06287

Deep Research Bench: Evaluating AI Web Research Agents

Nikos I. Bosse, Jon Evans, Robert G. Gambee, Daniel Hnyk, Peter Mühlbacher, Lawrence Phillips, Dan Schwarz, Jack Wildman

May 2025 arXiv:2409.14913

Towards a Realistic Long-Term Benchmark for Open-Web Research Agents

Peter Mühlbacher, Nikos I. Bosse, Lawrence Phillips

Research Articles

June 20, 2026

The Wealth of the Richest People in AI

June 18, 2026

How the Claude Fable Ban Shifts the Anthropic IPO and Valuation Forecast in 2026

June 18, 2026

How the US vs. Anthropic Standoff on Claude Fable Ended

June 18, 2026

Gemini 3.5 Pro Release Date and Frontier Forecast

June 18, 2026

Meta Conscripted 6,500 Engineers to Make AI Training Data: What Happens Next

May 28, 2026

Anthropic Revenue and Valuation in 2026 Leading to IPO

May 27, 2026

OpenAI Revenue, Losses, and IPO Valuation: Forecasts Through Late 2027

May 27, 2026

Waymo Profitability Forecast: Rides, Margins, and Losses Through 2027

May 3, 2026

Claude can miss the motives of politicians

May 2, 2026

AI takes people at their word

April 30, 2026

Opus 4.6 does better research, Gemini 3.1 has better judgment

April 29, 2026

Do Frontier AI Models Know What They Don't Know?

April 28, 2026

Rafael Poyiadzi·

AI Forecasting Benchmark Dataset: 1,417 Hard Questions (BTF-2)

April 8, 2026

Anthropic IPO Valuation Forecast After the $30B Run Rate Announcement

April 2, 2026

Which AI Researchers Have the Most Valuable Skills?

April 1, 2026

SpaceX IPO Price and Stock Forecast to 2027

March 31, 2026

Anthropic and OpenAI IPO Dates and Valuations: Forecasts and Odds

February 18, 2026

Peter Mühlbacher

Reasoning Effort Scaling: Claude 4.6 Gains, GPT-5 and Gemini Don't

February 12, 2026

Peter Mühlbacher

Higher effort settings in LLMs can reduce accuracy

February 5, 2026

Rafael Poyiadzi

Replacing human data labeling with LLMs in active learning

February 2, 2026

Peter Mühlbacher·

Lawrence Phillips·

Evaluating AI Forecasting At Scale

November 13, 2025

Backtesting with LLMs: Evaluating Quantitative Models and Forecasts

November 12, 2025

Forecasting Stock Revenue and Margins 5-10 Years Out: A Superforecasting Approach

November 11, 2025

Calculate Intrinsic Value for Any Stock with DCF Analysis

November 10, 2025

Automating Warren Buffett

November 6, 2025

5-Year 10-K Assessment: How Reliable Are S&P 500 Managers?

November 2, 2025

How to Analyze 10-Q Filings with LLMs

October 29, 2025

How to Analyze Earnings Calls with LLMs: 6 Patterns That Move Forecasts

October 28, 2025

How to Analyze 10-K Filings With LLMs (Without Getting Fooled)

October 28, 2025

How to Summarize Earnings Releases with AI Without Missing What Matters

October 15, 2025

Can AI Forecast Stocks Through Fundamental Analysis?

June 25, 2025

Deep Research Bench Leaderboard: LLM Web Research Agent Rankings

June 11, 2025

Peter Mühlbacher·

Lawrence Phillips

Bench to the Future: A Pastcasting Benchmark for Forecasting Agents

May 16, 2025

Peter Mühlbacher·

Lawrence Phillips·

Deep Research Bench: How Well Do AI Research Agents Actually Search the Web?

May 12, 2025

OpenAI Revenue, Losses, and Profitability in 2026: Full Financial Breakdown

April 3, 2025

Lawrence Phillips·

AI 2027 Report: A Forecast Critique of the Superintelligence Timeline

April 3, 2025

OpenAI Revenue Projections to 2027: Can It Reach $100B ARR?

September 24, 2024

Peter Mühlbacher·

Lawrence Phillips

A Realistic Benchmark for Open-Web Research Agents

September 18, 2024

OpenAI's Financials: ChatGPT Subscribers and API Revenue, Estimates vs. Reality

September 13, 2024

Peter Mühlbacher·

How Reasoning Models Compare on Real-World White-Collar Tasks

September 12, 2024

Peter Mühlbacher·

Lawrence Phillips·

Contra Papers Claiming Superhuman AI Forecasting

August 27, 2024

Lawrence Phillips·

Is OpenAI Profitable? Financial Forecasts & Margins Analysis

June 12, 2024

Lawrence Phillips·

OpenAI Revenue 2024: Complete ARR, API vs ChatGPT Breakdown

April 2, 2024

Lawrence Phillips·

Peter Mühlbacher

The Rationale-Shaped Hole at the Heart of Forecasting

Interested in our research?

Stay up to date with our latest findings and methodologies in AI reasoning and forecasting.

Try FutureSearch Join Our Team