Forecast
forecast takes a DataFrame of questions and produces calibrated forecasts for each row. It supports three modes:
- Binary: probability (0 to 100) of YES/NO questions like "Will X happen?"
- Numeric: percentile estimates (p10 through p90) for continuous quantities like "What will the price/value/count be?"
- Date: percentile date estimates (p10 through p90, as
YYYY-MM-DD) for timing questions like "When will X happen?"
Accuracy is measured on the public BTF-2 leaderboard and described in the Strategic Reasoning paper. The benchmark questions, ground-truth resolutions, and SOTA agent rationales are released as a Hugging Face dataset.
Forecast types
Binary
from pandas import DataFrame
from futuresearch.ops import forecast
questions = DataFrame([
{
"question": "Will the US Federal Reserve cut rates by at least 25bp before July 1, 2027?",
"resolution_criteria": "Resolves YES if the Fed announces at least one rate cut of 25bp or more at any FOMC meeting between now and June 30, 2027.",
},
])
result = await forecast(input=questions, forecast_type="binary")
print(result.data[["question", "probability", "rationale"]])
| Column | Type | Description |
|---|---|---|
probability |
int | 0 to 100, calibrated probability of YES resolution. Clamped to [3, 97]; even near-certain outcomes retain residual uncertainty. |
rationale |
str | Detailed reasoning with citations from web research |
Numeric
result = await forecast(
input=DataFrame([
{
"question": "What will the price of Brent crude oil be on December 31, 2026?",
"resolution_criteria": "Closing spot price of Brent crude oil (ICE) on Dec 31, 2026.",
},
]),
forecast_type="numeric",
output_field="price",
units="USD per barrel",
)
print(result.data[["price_p10", "price_p25", "price_p50", "price_p75", "price_p90"]])
| Column | Type | Description |
|---|---|---|
{output_field}_p10 … {output_field}_p90 |
float | 10th, 25th, 50th, 75th, and 90th percentile estimates. Monotonically non-decreasing: p10 ≤ p25 ≤ p50 ≤ p75 ≤ p90. |
units |
str | The units provided as parameter |
rationale |
str | Detailed reasoning with citations |
Schema: engine/services/forecast/data_types.py:83-105.
Date
result = await forecast(
input=DataFrame([
{
"question": "When will Anthropic IPO?",
"resolution_criteria": "Date Anthropic common shares first trade on a public exchange.",
},
]),
forecast_type="date",
output_field="ipo_date",
)
print(result.data[["ipo_date_p10", "ipo_date_p50", "ipo_date_p90", "rationale"]])
| Column | Type | Description |
|---|---|---|
{output_field}_p10 … {output_field}_p90 |
str | YYYY-MM-DD percentile estimates, or the literal "never" for percentiles in the indefinite future |
rationale |
str | Detailed reasoning with citations |
Schema: engine/services/forecast/data_types.py:63-80.
Batch context
When all rows share common framing, pass it via context instead of repeating it in every row:
result = await forecast(
input=geopolitics_questions,
forecast_type="binary",
context="Focus on EU regulatory and diplomatic sources. Assume all questions resolve by end of 2027.",
)
Leave context empty when rows are self-contained. A well-specified question with resolution criteria needs no additional instruction.
Input columns
The input DataFrame should contain at minimum a question column. All columns are passed to the research agents and forecasters.
| Column | Required | Purpose |
|---|---|---|
question |
Yes | The question to forecast |
resolution_criteria |
Recommended | Exactly how the outcome is determined |
resolution_date |
Optional | When the question closes |
background |
Optional | Additional context the forecasters should know |
Column names are not enforced. Research agents infer meaning from content, so a column named scenario instead of question works fine.
Parameters
| Name | Type | Description |
|---|---|---|
input |
DataFrame | Rows to forecast, one question per row |
forecast_type |
"binary" | "numeric" | "date" |
Type of forecast to produce |
effort_level |
"LOW" | "HIGH" | None |
See Effort and cost below. Defaults to None (auto-resolved by row count). |
context |
str | None | Optional batch-level instructions that apply to every row |
output_field |
str | None | Name of the quantity being forecast (required for numeric and date, e.g. "price", "launch_date") |
units |
str | None | Units for the forecast (required for numeric, e.g. "USD per barrel", "billions USD") |
session |
Session | Optional, auto-created if omitted |
The forecast_type enum is defined in engine/services/forecast/data_types.py (ForecastType.BINARY | NUMERIC | DATE); effort_level in the same file (ForecastEffortLevel.LOW | HIGH).
Effort and cost
effort_level trades cost for accuracy:
| Effort | Per-row time | Per-row cost |
|---|---|---|
LOW (default for batches) |
~3 to 5 min | $0.09 to $0.20 |
HIGH (default for single) |
~5 to 10 min | ~$1.20 |
Default effort resolves automatically: HIGH for a single forecast, LOW for many. When effort_level=None, the engine uses HIGH for row_count <= 1 and LOW otherwise (engine/services/forecast/effort.py:18-27). One-off questions get accurate forecasting; large batches stay affordable. See /forecast for worked examples.
Via MCP
MCP tool: futuresearch_forecast
| Parameter | Type | Description |
|---|---|---|
data |
list[object] | Inline data as a list of row objects |
artifact_id |
string | Alternatively, an artifact ID from a previous upload |
forecast_type |
"binary" | "numeric" | "date" |
Type of forecast to produce |
effort_level |
"LOW" | "HIGH" |
Optional. Defaults: HIGH for a single question, LOW for multiple. |
context |
string | Optional batch-level context for all questions |
output_field |
string | Name of the quantity (required for numeric and date) |
units |
string | Units (required for numeric) |
Provide either data or artifact_id, not both. See the MCP server reference for the rest of the lifecycle (progress, results, status).
Related docs
Guides
- Turn Claude into an Accurate Forecaster: binary, numeric, and date forecasting for any question.
- Find Profitable Prediction Market Trades: Polymarket and Kalshi screening.
- Forecast Outcomes for a List of Entities: one outcome per row across a list.
- Value a Private Company: sum-of-the-parts forecasting.
Case studies
- Forecast When Anthropic and OpenAI Will IPO: date mode.
- Forecast Anthropic and OpenAI IPO Valuations: numeric, high effort.
- Forecast a SpaceX Sum-of-the-Parts Valuation: numeric, multi-segment.
- Forecast AI Researcher Seed Valuations: numeric across 116 entities.