Publications
arXiv:2604.26106
Evaluating Strategic Reasoning in Forecasting Agents
April 2026arXiv:2601.22444Automating Forecasting Question Generation and Resolution for AI Evaluation
January 2026arXiv:2506.21558Bench to the Future: A Pastcasting Benchmark for Forecasting Agents
June 2025arXiv:2506.06287Deep Research Bench: Evaluating AI Web Research Agents
May 2025arXiv:2409.14913Towards a Realistic Long-Term Benchmark for Open-Web Research Agents
September 2024Research Articles
How the US vs. Anthropic Standoff on Claude Fable Will End
Gemini 3.5 Pro Release Date and Frontier Forecast
Meta Conscripted 6,500 Engineers to Make AI Training Data: What Happens Next
Anthropic Revenue and Valuation in 2026 Leading to IPO
OpenAI Revenue, Losses, and IPO Valuation: Forecasts Through Late 2027
Waymo Profitability Forecast: Rides, Margins, and Losses Through 2027
Claude can miss the motives of politicians
AI takes people at their word
Opus 4.6 does better research, Gemini 3.1 has better judgment
Do Frontier AI Models Know What They Don't Know?
AI Forecasting Benchmark Dataset: 1,417 Hard Questions (BTF-2)
Anthropic IPO Valuation Forecast After the $30B Run Rate Announcement
Which AI Researchers Have the Most Valuable Skills?
SpaceX IPO Price and Stock Forecast of Reversion to Fundamentals
Anthropic and OpenAI IPO Dates and Valuations: Updated Forecasts
Reasoning Effort Scaling: Claude 4.6 Gains, GPT-5 and Gemini Don't
Higher effort settings in LLMs can reduce accuracy
Replacing human data labeling with LLMs in active learning
Evaluating AI Forecasting At Scale
Backtesting with LLMs: Evaluating Quantitative Models and Forecasts
Forecasting Stock Revenue and Margins 5-10 Years Out: A Superforecasting Approach
Calculate Intrinsic Value for Any Stock with DCF Analysis
Automating Warren Buffett
5-Year 10-K Assessment: How Reliable Are S&P 500 Managers?
How to Analyze 10-Q Filings with LLMs
How to Analyze Earnings Calls with LLMs: 6 Patterns That Move Forecasts
How to Analyze 10-K Filings With LLMs (Without Getting Fooled)
How to Summarize Earnings Releases with AI Without Missing What Matters
Can AI Forecast Stocks Through Fundamental Analysis?
Deep Research Bench Leaderboard: LLM Web Research Agent Rankings
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents
Deep Research Bench: How Well Do AI Research Agents Actually Search the Web?
OpenAI Revenue, Losses, and Profitability in 2026: Full Financial Breakdown
AI 2027 Report: A Forecast Critique of the Superintelligence Timeline
OpenAI Revenue Projections to 2027: Can It Reach $100B ARR?
A Realistic Benchmark for Open-Web Research Agents
OpenAI's Financials: ChatGPT Subscribers and API Revenue, Estimates vs. Reality
How Reasoning Models Compare on Real-World White-Collar Tasks
Contra Papers Claiming Superhuman AI Forecasting
Is OpenAI Profitable? Financial Forecasts & Margins Analysis
OpenAI Revenue 2024: Complete ARR, API vs ChatGPT Breakdown
The Rationale-Shaped Hole at the Heart of Forecasting
Interested in our research?
Stay up to date with our latest findings and methodologies in AI reasoning and forecasting.