Forecast

forecast takes a DataFrame of questions and produces calibrated forecasts for each row. It supports three modes:

Binary: probability (0 to 100) of YES/NO questions like "Will X happen?"
Numeric: percentile estimates (p10 through p90) for continuous quantities like "What will the price/value/count be?"
Date: percentile date estimates (p10 through p90, as YYYY-MM-DD) for timing questions like "When will X happen?"

Accuracy is measured on the public BTF-2 leaderboard and described in the Strategic Reasoning paper. The benchmark questions, ground-truth resolutions, and SOTA agent rationales are released as a Hugging Face dataset. For a real-world test of the same methodology, see how an S&P 500 paper portfolio built from FutureSearch forecasts has performed.

Forecast types

Binary

from pandas import DataFrame
from futuresearch.ops import forecast

questions = DataFrame([
    {
        "question": "Will the US Federal Reserve cut rates by at least 25bp before July 1, 2027?",
        "resolution_criteria": "Resolves YES if the Fed announces at least one rate cut of 25bp or more at any FOMC meeting between now and June 30, 2027.",
    },
])

result = await forecast(input=questions, forecast_type="binary")
print(result.data[["question", "probability", "rationale"]])

Column	Type	Description
`probability`	int	0 to 100, calibrated probability of YES resolution. Clamped to [3, 97]; even near-certain outcomes retain residual uncertainty.
`rationale`	str	Detailed reasoning with citations from web research

Numeric

result = await forecast(
    input=DataFrame([
        {
            "question": "What will the price of Brent crude oil be on December 31, 2026?",
            "resolution_criteria": "Closing spot price of Brent crude oil (ICE) on Dec 31, 2026.",
        },
    ]),
    forecast_type="numeric",
    output_field="price",
    units="USD per barrel",
)
print(result.data[["price_p10", "price_p25", "price_p50", "price_p75", "price_p90"]])

Column	Type	Description
`{output_field}_p10` … `{output_field}_p90`	float	10th, 25th, 50th, 75th, and 90th percentile estimates. Monotonically non-decreasing: p10 ≤ p25 ≤ p50 ≤ p75 ≤ p90.
`units`	str	The units provided as parameter
`rationale`	str	Detailed reasoning with citations

Schema: engine/services/forecast/data_types.py:83-105.

Date

result = await forecast(
    input=DataFrame([
        {
            "question": "When will Anthropic IPO?",
            "resolution_criteria": "Date Anthropic common shares first trade on a public exchange.",
        },
    ]),
    forecast_type="date",
    output_field="ipo_date",
)
print(result.data[["ipo_date_p10", "ipo_date_p50", "ipo_date_p90", "rationale"]])

Column	Type	Description
`{output_field}_p10` … `{output_field}_p90`	str	`YYYY-MM-DD` percentile estimates, or the literal `"never"` for percentiles in the indefinite future
`rationale`	str	Detailed reasoning with citations

Schema: engine/services/forecast/data_types.py:63-80.

Batch context

When all rows share common framing, pass it via context instead of repeating it in every row:

result = await forecast(
    input=geopolitics_questions,
    forecast_type="binary",
    context="Focus on EU regulatory and diplomatic sources. Assume all questions resolve by end of 2027.",
)

Leave context empty when rows are self-contained. A well-specified question with resolution criteria needs no additional instruction.

Input columns

The input DataFrame should contain at minimum a question column. All columns are passed to the research agents and forecasters.

Column	Required	Purpose
`question`	Yes	The question to forecast
`resolution_criteria`	Recommended	Exactly how the outcome is determined
`resolution_date`	Optional	When the question closes
`background`	Optional	Additional context the forecasters should know

Column names are not enforced. Research agents infer meaning from content, so a column named scenario instead of question works fine.

Parameters

Name	Type	Description
`input`	DataFrame	Rows to forecast, one question per row
`forecast_type`	`"binary"` \| `"numeric"` \| `"date"`	Type of forecast to produce
`effort_level`	`"LOW"` \| `"HIGH"` \| `None`	See Effort and cost below. Defaults to `None` (auto-resolved by row count).
`context`	str \| None	Optional batch-level instructions that apply to every row
`output_field`	str \| None	Name of the quantity being forecast (required for `numeric` and `date`, e.g. `"price"`, `"launch_date"`)
`units`	str \| None	Units for the forecast (required for `numeric`, e.g. `"USD per barrel"`, `"billions USD"`)
`session`	Session	Optional, auto-created if omitted

The forecast_type enum is defined in engine/services/forecast/data_types.py (ForecastType.BINARY | NUMERIC | DATE); effort_level in the same file (ForecastEffortLevel.LOW | HIGH).

Effort and cost

effort_level trades cost for accuracy:

Effort	Per-row time	Per-row cost
`LOW` (default for batches)	~3 to 5 min	$0.09 to $0.20
`HIGH` (default for single)	~5 to 10 min	~$1.20

Default effort resolves automatically: HIGH for a single forecast, LOW for many. When effort_level=None, the engine uses HIGH for row_count <= 1 and LOW otherwise (engine/services/forecast/effort.py:18-27). One-off questions get accurate forecasting; large batches stay affordable. See /forecast for worked examples.

Via MCP

MCP tool: futuresearch_forecast

Parameter	Type	Description
`data`	list[object]	Inline data as a list of row objects
`artifact_id`	string	Alternatively, an artifact ID from a previous upload
`forecast_type`	`"binary"` \| `"numeric"` \| `"date"`	Type of forecast to produce
`effort_level`	`"LOW"` \| `"HIGH"`	Optional. Defaults: `HIGH` for a single question, `LOW` for multiple.
`context`	string	Optional batch-level context for all questions
`output_field`	string	Name of the quantity (required for `numeric` and `date`)
`units`	string	Units (required for `numeric`)

Provide either data or artifact_id, not both. See the MCP server reference for the rest of the lifecycle (progress, results, status).

Related docs

Guides

Turn Claude into an Accurate Forecaster: binary, numeric, and date forecasting for any question.
Find Profitable Prediction Market Trades: Polymarket and Kalshi screening.
Forecast Outcomes for a List of Entities: one outcome per row across a list.
Value a Private Company: sum-of-the-parts forecasting.

Case studies

Forecast When Anthropic and OpenAI Will IPO: date mode.
Forecast Anthropic and OpenAI IPO Valuations: numeric, high effort.
Forecast a SpaceX Sum-of-the-Parts Valuation: numeric, multi-segment.
Forecast AI Researcher Seed Valuations: numeric across 116 entities.

Long-form research