Forecast
forecast takes a DataFrame of questions and produces calibrated forecasts for each row. It supports five modes:
- Binary: probability (0 to 100) of YES/NO questions like "Will X happen?"
- Numeric: percentile estimates (p10 through p90) for continuous quantities like "What will the price/value/count be?"
- Date: percentile date estimates (p10 through p90, as
YYYY-MM-DD) for timing questions like "When will X happen?" - Categorical: one probability per outcome for a mutually exclusive, exhaustive set like "Which candidate wins: A, B, C or Other?" (probabilities sum to 100)
- Thresholded: one probability per threshold condition on a single outcome, like oil above $80 / $90 / $100 (probabilities non-increasing across the conditions)
Categorical and thresholded are grouped modes: each row carries its own set of options, all forecast jointly with a single rationale. Both require effort_level="HIGH".
Accuracy is measured on the public BTF-2 leaderboard and described in the Strategic Reasoning paper. The benchmark questions, ground-truth resolutions, and SOTA agent rationales are released as a Hugging Face dataset. For a real-world test of the same methodology, see how an S&P 500 paper portfolio built from FutureSearch forecasts has performed.
Forecast types
Binary
from pandas import DataFrame
from futuresearch.ops import forecast
questions = DataFrame([
{
"question": "Will the US Federal Reserve cut rates by at least 25bp before July 1, 2027?",
"resolution_criteria": "Resolves YES if the Fed announces at least one rate cut of 25bp or more at any FOMC meeting between now and June 30, 2027.",
},
])
result = await forecast(input=questions, forecast_type="binary")
print(result.data[["question", "probability", "rationale"]])
| Column | Type | Description |
|---|---|---|
probability |
int | 0 to 100, calibrated probability of YES resolution. Clamped to [3, 97]; even near-certain outcomes retain residual uncertainty. |
rationale |
str | Detailed reasoning with citations from web research |
Numeric
result = await forecast(
input=DataFrame([
{
"question": "What will the price of Brent crude oil be on December 31, 2026?",
"resolution_criteria": "Closing spot price of Brent crude oil (ICE) on Dec 31, 2026.",
},
]),
forecast_type="numeric",
output_field="price",
units="USD per barrel",
)
print(result.data[["price_p10", "price_p25", "price_p50", "price_p75", "price_p90"]])
| Column | Type | Description |
|---|---|---|
{output_field}_p10 … {output_field}_p90 |
float | 10th, 25th, 50th, 75th, and 90th percentile estimates. Monotonically non-decreasing: p10 ≤ p25 ≤ p50 ≤ p75 ≤ p90. |
units |
str | The units provided as parameter |
rationale |
str | Detailed reasoning with citations |
Schema: engine/services/forecast/data_types.py:83-105.
Date
result = await forecast(
input=DataFrame([
{
"question": "When will Anthropic IPO?",
"resolution_criteria": "Date Anthropic common shares first trade on a public exchange.",
},
]),
forecast_type="date",
output_field="ipo_date",
)
print(result.data[["ipo_date_p10", "ipo_date_p50", "ipo_date_p90", "rationale"]])
| Column | Type | Description |
|---|---|---|
{output_field}_p10 … {output_field}_p90 |
str | YYYY-MM-DD percentile estimates, or the literal "never" for percentiles in the indefinite future |
rationale |
str | Detailed reasoning with citations |
Schema: engine/services/forecast/data_types.py:63-80.
Categorical
For questions with a fixed set of mutually exclusive outcomes. Each row names an input column (via categories_field) holding that row's options as a JSON array of strings. The outcomes are researched together and forecast jointly, so the probabilities are coherent and sum to 100. Make the set exhaustive: add an explicit "Other" option when the listed candidates don't cover every possibility.
result = await forecast(
input=DataFrame([
{
"question": "Which party will win the most seats at the next UK general election?",
"resolution_criteria": "Party with the most seats in the House of Commons after the next general election.",
"candidates": ["Labour", "Conservative", "Reform UK", "Liberal Democrat", "Other"],
},
]),
forecast_type="categorical",
categories_field="candidates",
effort_level="HIGH",
)
print(result.data[["probabilities", "rationale"]])
| Column | Type | Description |
|---|---|---|
probabilities |
str | JSON object mapping each outcome to its probability (0 to 100). Mutually exclusive and exhaustive: the values sum to 100. |
rationale |
str | One joint rationale covering all outcomes, with citations |
Each row's categories_field column must hold 2 to 50 unique options. Categorical is HIGH effort only.
Schema: engine/services/forecast/data_types.py (build_grouped_response_schema).
Thresholded
For a single uncertain quantity, forecast the probability that it clears each of several thresholds. Each row names an input column (via thresholds_field) holding its conditions as a JSON array of numbers or strings, ordered from least strict to most strict. The conditions are nested, so the probabilities are non-increasing down the list.
result = await forecast(
input=DataFrame([
{
"question": "What will the price of Brent crude oil be on December 31, 2026?",
"resolution_criteria": "Closing spot price of Brent crude oil (ICE) on Dec 31, 2026.",
"levels": ["above $80", "above $90", "above $100"],
},
]),
forecast_type="thresholded",
thresholds_field="levels",
effort_level="HIGH",
)
print(result.data[["probabilities", "rationale"]])
| Column | Type | Description |
|---|---|---|
probabilities |
str | JSON object mapping each condition to its probability (0 to 100). Non-increasing across the listed order, since each condition is stricter than the last. |
rationale |
str | One joint rationale covering all thresholds, with citations |
Each row's thresholds_field column must hold 2 to 50 unique conditions in least-strict-to-most-strict order; the engine treats the labels as opaque and cannot reorder them for you. Thresholded is HIGH effort only.
Schema: engine/services/forecast/data_types.py (build_grouped_response_schema).
Batch context
When all rows share common framing, pass it via context instead of repeating it in every row:
result = await forecast(
input=geopolitics_questions,
forecast_type="binary",
context="Focus on EU regulatory and diplomatic sources. Assume all questions resolve by end of 2027.",
)
Leave context empty when rows are self-contained. A well-specified question with resolution criteria needs no additional instruction.
Input columns
The input DataFrame should contain at minimum a question column. All columns are passed to the research agents and forecasters.
When forecasting prediction-market or contest questions, pass in the relevant fields from that platform (resolution criteria, close date, creation date, and current price) to ensure a high-quality forecast. See Prediction-market questions below for the per-platform field mappings, API endpoints, and a worked example.
| Column | Required | Purpose |
|---|---|---|
question |
Yes | The question to forecast |
resolution_criteria |
Recommended | Exactly how the outcome is determined, verbatim from the source when one exists: Polymarket description, Kalshi rules_primary + rules_secondary, Metaculus resolution_criteria + fine_print |
resolution_date |
Optional | When the question closes or resolves: Polymarket endDate, Kalshi close_time, Metaculus scheduled_close_time |
background |
Optional | Additional context the forecasters should know |
market_creation_date |
Recommended for prediction markets | When the market or question was created: Polymarket createdAt, Kalshi created_time, Metaculus created_at |
market_price |
Recommended for prediction markets | The current market price or community forecast, with its as-of date: Polymarket outcomePrices, Kalshi yes_bid/yes_ask mid, Metaculus community prediction |
Column names are not enforced. Research agents infer meaning from content, so a column named scenario instead of question works fine.
Self-contained questions need none of the optional columns; {"question": "When will Anthropic IPO?"} is a perfectly good row. The optional columns matter when the question has an external source of truth; include them whenever they exist.
Prediction-market questions (Polymarket, Kalshi, Metaculus)
When a question lives on a prediction market or forecasting platform, forecast quality depends on passing the market's own definition of the question. Fetch the market from the platform API and pass its fields through verbatim, without paraphrasing the resolution criteria. The clauses that look like boilerplate (official data source, delay handling, early-close conditions) are often the ones that decide the outcome.
| Input column | Polymarket (Gamma API) | Kalshi (events API) | Metaculus (API token required) |
|---|---|---|---|
question |
market.question |
event title + child market yes_sub_title |
title |
resolution_criteria |
market.description (already includes the fine print) |
rules_primary, plus rules_secondary when non-empty |
resolution_criteria + fine_print |
resolution_date |
endDate |
close_time |
scheduled_close_time |
background |
event title + event description |
event title, sub_title, category |
description |
market_creation_date |
createdAt |
created_time |
created_at |
market_price |
first element of outcomePrices, or the bestBid/bestAsk mid |
mid of yes_bid_dollars/yes_ask_dollars (dollar strings; last_price_dollars goes stale on thin markets) |
community prediction |
Endpoints:
- Polymarket:
GET https://gamma-api.polymarket.com/events?slug={event-slug}returns the event with a nestedmarkets[]array. No auth. - Kalshi:
GET https://external-api.kalshi.com/trade-api/v2/events/{EVENT_TICKER}?with_nested_markets=true, where the event ticker is the last URL path segment, uppercased. No auth. - Metaculus:
GET https://www.metaculus.com/api/posts/{id}/with anAuthorization: Token <token>header (free account required; unauthenticated requests return 403).
Most Polymarket and Kalshi event URLs map to many child markets (one per candidate, threshold, or date bucket), so make sure you build a row from the specific child market you mean to forecast.
import json
from datetime import date
import httpx
from pandas import DataFrame
from futuresearch.ops import forecast
event = httpx.get(
"https://gamma-api.polymarket.com/events",
params={"slug": "world-cup-winner"},
).json()[0]
market = event["markets"][0] # pick the child market you want
yes_price = json.loads(market["outcomePrices"])[0]
questions = DataFrame([{
"question": market["question"],
"resolution_criteria": market["description"], # verbatim, don't paraphrase
"resolution_date": market["endDate"],
"background": event["title"],
"market_creation_date": market["createdAt"],
"market_price": f"{yes_price} (Yes, as of {date.today()})",
}])
result = await forecast(input=questions, forecast_type="binary")
For screening many markets at once, see Find Profitable Prediction Market Trades.
Parameters
| Name | Type | Description |
|---|---|---|
input |
DataFrame | Rows to forecast, one question per row |
forecast_type |
"binary" | "numeric" | "date" | "categorical" | "thresholded" |
Type of forecast to produce |
effort_level |
"LOW" | "HIGH" | None |
See Effort and cost below. Defaults to None (auto-resolved by row count). categorical and thresholded require "HIGH". |
context |
str | None | Optional batch-level instructions that apply to every row |
output_field |
str | None | Name of the quantity being forecast (required for numeric and date, e.g. "price", "launch_date") |
units |
str | None | Units for the forecast (required for numeric, e.g. "USD per barrel", "billions USD") |
categories_field |
str | None | Name of the input column holding each row's outcomes as a JSON array of strings, 2 to 50 unique options (required for categorical) |
thresholds_field |
str | None | Name of the input column holding each row's threshold conditions as a JSON array of numbers or strings, least strict to most strict (required for thresholded) |
session |
Session | Optional, auto-created if omitted |
The forecast_type enum is defined in engine/services/forecast/data_types.py (ForecastType.BINARY | NUMERIC | DATE | CATEGORICAL | THRESHOLDED); effort_level in the same file (ForecastEffortLevel.LOW | HIGH).
Effort and cost
effort_level trades cost for accuracy:
| Effort | Per-row time | Per-row cost |
|---|---|---|
LOW (default for batches) |
~3 to 5 min | $0.09 to $0.20 |
HIGH (default for single) |
~5 to 10 min | ~$1.20 |
Default effort resolves automatically: HIGH for a single forecast, LOW for many. When effort_level=None, the engine uses HIGH for row_count <= 1 and LOW otherwise (engine/services/forecast/effort.py:18-27). One-off questions get accurate forecasting; large batches stay affordable. See /forecast for worked examples.
Via MCP
MCP tool: futuresearch_forecast
| Parameter | Type | Description |
|---|---|---|
data |
list[object] | Inline data as a list of row objects |
artifact_id |
string | Alternatively, an artifact ID from a previous upload |
forecast_type |
"binary" | "numeric" | "date" | "categorical" | "thresholded" |
Type of forecast to produce |
effort_level |
"LOW" | "HIGH" |
Optional. Defaults: HIGH for a single question, LOW for multiple. categorical and thresholded require HIGH. |
context |
string | Optional batch-level context for all questions |
output_field |
string | Name of the quantity (required for numeric and date) |
units |
string | Units (required for numeric) |
categories_field |
string | Name of the column holding each row's outcomes as a JSON array of strings (required for categorical) |
thresholds_field |
string | Name of the column holding each row's threshold conditions as a JSON array, least strict to most strict (required for thresholded) |
Provide either data or artifact_id, not both. See the MCP server reference for the rest of the lifecycle (progress, results, status).
Related docs
Guides
- Turn Claude into an Accurate Forecaster: binary, numeric, and date forecasting for any question.
- Find Profitable Prediction Market Trades: Polymarket and Kalshi screening.
- Forecast Outcomes for a List of Entities: one outcome per row across a list.
- Value a Private Company: sum-of-the-parts forecasting.
Case studies
- Forecast When Anthropic and OpenAI Will IPO: date mode.
- Forecast Anthropic and OpenAI IPO Valuations: numeric, high effort.
- Forecast a SpaceX Sum-of-the-Parts Valuation: numeric, multi-segment.
- Forecast AI Researcher Seed Valuations: numeric across 116 entities.