Publications
arXiv:2601.22444
Automating Forecasting Question Generation and Resolution for AI Evaluation
January 2026arXiv:2506.21558Bench to the Future: A Pastcasting Benchmark for Forecasting Agents
June 2025arXiv:2506.06287Deep Research Bench: Evaluating AI Web Research Agents
May 2025arXiv:2409.14913Towards a Realistic Long-Term Benchmark for Open-Web Research Agents
September 2024Research Articles
Which AI Researchers Have the Most Valuable Skills?
A $1.75 Trillion IPO Would Be Overpaying 30% for SpaceX
Anthropic and OpenAI IPO timelines and valuations
More reasoning tokens helps Claude, but not GPT or Gemini
Higher effort settings in LLMs can reduce accuracy
Replacing human data labeling with LLMs in active learning
Evaluating AI Forecasting At Scale
Backtesting with LLMs: Evaluating Quantitative Models and Forecasts
Forecasting Stock Revenue and Margins 5-10 Years Out: A Superforecasting Approach
5-Year 10-K Assessment: How Reliable Are S&P 500 Managers?
Can AI Forecast Stocks Through Fundamental Analysis?
Deep Research Bench Leaderboard: LLM Web Research Agent Rankings
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents
Deep Research Bench: How Well Do AI Research Agents Actually Search the Web?
OpenAI Revenue, Losses, and Profitability in 2026: Full Financial Breakdown
Where the AI 2027 Report Gets It Wrong on Superintelligence Timelines
OpenAI Revenue Projections to 2027: Can It Reach $100B ARR?
A Realistic Benchmark for Open-Web Research Agents
OpenAI's financials: a Case Study of claims vs. reality
How Reasoning Models Compare on Real-World White-Collar Tasks
Contra Papers Claiming Superhuman AI Forecasting
Is OpenAI Profitable? Financial Forecasts & Margins Analysis
OpenAI Revenue 2024: Complete ARR, API vs ChatGPT Breakdown
The Rationale-Shaped Hole at the Heart of Forecasting
Interested in our research?
Stay up to date with our latest findings and methodologies in AI reasoning and forecasting.