Research

Publications

arXiv:2601.22444

Automating Forecasting Question Generation and Resolution for AI Evaluation

Nikos I. Bosse, Peter Mühlbacher, Jack Wildman, Lawrence Phillips, Dan Schwarz

January 2026 arXiv:2506.21558

Bench to the Future: A Pastcasting Benchmark for Forecasting Agents

Jack Wildman, Nikos I. Bosse, Daniel Hnyk, Peter Mühlbacher, Finn Hambly, Jon Evans, Dan Schwarz, Lawrence Phillips

June 2025 arXiv:2506.06287

Deep Research Bench: Evaluating AI Web Research Agents

Nikos I. Bosse, Jon Evans, Robert G. Gambee, Daniel Hnyk, Peter Mühlbacher, Lawrence Phillips, Dan Schwarz, Jack Wildman

May 2025 arXiv:2409.14913

Towards a Realistic Long-Term Benchmark for Open-Web Research Agents

Peter Mühlbacher, Nikos I. Bosse, Lawrence Phillips

Research Articles

April 8, 2026

How the $30B run rate boosted Anthropic's forecasted IPO valuation

April 2, 2026

Which AI Researchers Have the Most Valuable Skills?

April 1, 2026

A $1.75 Trillion IPO Would Be Overpaying 30% for SpaceX

March 31, 2026

Anthropic and OpenAI IPO timelines and valuations

February 18, 2026

Peter Mühlbacher

More reasoning tokens helps Claude, but not GPT or Gemini

February 12, 2026

Peter Mühlbacher

Higher effort settings in LLMs can reduce accuracy

February 5, 2026

Rafael Poyiadzi

Replacing human data labeling with LLMs in active learning

February 2, 2026

Peter Mühlbacher·

Lawrence Phillips·

Evaluating AI Forecasting At Scale

November 13, 2025

Backtesting with LLMs: Evaluating Quantitative Models and Forecasts

November 12, 2025

Forecasting Stock Revenue and Margins 5-10 Years Out: A Superforecasting Approach

November 6, 2025

5-Year 10-K Assessment: How Reliable Are S&P 500 Managers?

October 15, 2025

Can AI Forecast Stocks Through Fundamental Analysis?

June 25, 2025

Deep Research Bench Leaderboard: LLM Web Research Agent Rankings

June 11, 2025

Peter Mühlbacher·

Lawrence Phillips

Bench to the Future: A Pastcasting Benchmark for Forecasting Agents

May 16, 2025

Peter Mühlbacher·

Lawrence Phillips·

Deep Research Bench: How Well Do AI Research Agents Actually Search the Web?

May 12, 2025

OpenAI Revenue, Losses, and Profitability in 2026: Full Financial Breakdown

April 3, 2025

Lawrence Phillips·

Where the AI 2027 Report Gets It Wrong on Superintelligence Timelines

April 3, 2025

OpenAI Revenue Projections to 2027: Can It Reach $100B ARR?

September 24, 2024

Peter Mühlbacher·

Lawrence Phillips

A Realistic Benchmark for Open-Web Research Agents

September 18, 2024

OpenAI's financials: a Case Study of claims vs. reality

September 13, 2024

Peter Mühlbacher·

How Reasoning Models Compare on Real-World White-Collar Tasks

September 12, 2024

Peter Mühlbacher·

Lawrence Phillips·

Contra Papers Claiming Superhuman AI Forecasting

August 27, 2024

Lawrence Phillips·

Is OpenAI Profitable? Financial Forecasts & Margins Analysis

June 12, 2024

Lawrence Phillips·

OpenAI Revenue 2024: Complete ARR, API vs ChatGPT Breakdown

April 2, 2024

Lawrence Phillips·

Peter Mühlbacher

The Rationale-Shaped Hole at the Heart of Forecasting

Interested in our research?

Stay up to date with our latest findings and methodologies in AI reasoning and forecasting.

Try FutureSearch Join Our Team