FutureSearch Logofuturesearch
  • Solutions
  • Pricing
  • Research
  • Docs
  • Evals
  • Markets
  • Blog
  • Company
  • Try it for free
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHubSupport

Integrations

Claude CodeCursorChatGPT CodexClaude.ai

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Code
  • Web App
  • Python SDK
  • Skill
Reference
  • API Key
  • forecast
  • multi_agent
  • agent_map
  • rank
  • classify
  • merge
  • dedupe
  • MCP Server
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Error Handling in FutureSearch: Failed Rows and Partial Results
  • Filter a Dataset Intelligently
  • Find Profitable Prediction Market Trades
  • Forecast Outcomes for a List of Entities
  • Value a Private Company
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Research a Question with a Team of Agents
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
  • Turn Claude into an Accurate Forecaster
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Find Startups Selling to Frontier AI Labs
  • Forecast a Sum-of-the-Parts SpaceX IPO Valuation
  • Forecast Anthropic and OpenAI IPO Valuations
  • Forecast Founder Seed Valuations for AI Researchers
  • Forecast When Anthropic and OpenAI Will IPO
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Research Formal Verification for AI
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Forecast

forecast takes a DataFrame of questions and produces calibrated forecasts for each row. It supports five modes:

  • Binary: probability (0 to 100) of YES/NO questions like "Will X happen?"
  • Numeric: percentile estimates (p10 through p90) for continuous quantities like "What will the price/value/count be?"
  • Date: percentile date estimates (p10 through p90, as YYYY-MM-DD) for timing questions like "When will X happen?"
  • Categorical: one probability per outcome for a mutually exclusive, exhaustive set like "Which candidate wins: A, B, C or Other?" (probabilities sum to 100)
  • Thresholded: one probability per threshold condition on a single outcome, like oil above $80 / $90 / $100 (probabilities non-increasing across the conditions)

Categorical and thresholded are grouped modes: each row carries its own set of options, all forecast jointly with a single rationale. Both require effort_level="HIGH".

Accuracy is measured on the public BTF-2 leaderboard and described in the Strategic Reasoning paper. The benchmark questions, ground-truth resolutions, and SOTA agent rationales are released as a Hugging Face dataset. For a real-world test of the same methodology, see how an S&P 500 paper portfolio built from FutureSearch forecasts has performed.

Forecast types

Binary

from pandas import DataFrame
from futuresearch.ops import forecast

questions = DataFrame([
    {
        "question": "Will the US Federal Reserve cut rates by at least 25bp before July 1, 2027?",
        "resolution_criteria": "Resolves YES if the Fed announces at least one rate cut of 25bp or more at any FOMC meeting between now and June 30, 2027.",
    },
])

result = await forecast(input=questions, forecast_type="binary")
print(result.data[["question", "probability", "rationale"]])
Column Type Description
probability int 0 to 100, calibrated probability of YES resolution. Clamped to [3, 97]; even near-certain outcomes retain residual uncertainty.
rationale str Detailed reasoning with citations from web research

Numeric

result = await forecast(
    input=DataFrame([
        {
            "question": "What will the price of Brent crude oil be on December 31, 2026?",
            "resolution_criteria": "Closing spot price of Brent crude oil (ICE) on Dec 31, 2026.",
        },
    ]),
    forecast_type="numeric",
    output_field="price",
    units="USD per barrel",
)
print(result.data[["price_p10", "price_p25", "price_p50", "price_p75", "price_p90"]])
Column Type Description
{output_field}_p10 … {output_field}_p90 float 10th, 25th, 50th, 75th, and 90th percentile estimates. Monotonically non-decreasing: p10 ≤ p25 ≤ p50 ≤ p75 ≤ p90.
units str The units provided as parameter
rationale str Detailed reasoning with citations

Schema: engine/services/forecast/data_types.py:83-105.

Date

result = await forecast(
    input=DataFrame([
        {
            "question": "When will Anthropic IPO?",
            "resolution_criteria": "Date Anthropic common shares first trade on a public exchange.",
        },
    ]),
    forecast_type="date",
    output_field="ipo_date",
)
print(result.data[["ipo_date_p10", "ipo_date_p50", "ipo_date_p90", "rationale"]])
Column Type Description
{output_field}_p10 … {output_field}_p90 str YYYY-MM-DD percentile estimates, or the literal "never" for percentiles in the indefinite future
rationale str Detailed reasoning with citations

Schema: engine/services/forecast/data_types.py:63-80.

Categorical

For questions with a fixed set of mutually exclusive outcomes. Each row names an input column (via categories_field) holding that row's options as a JSON array of strings. The outcomes are researched together and forecast jointly, so the probabilities are coherent and sum to 100. Make the set exhaustive: add an explicit "Other" option when the listed candidates don't cover every possibility.

result = await forecast(
    input=DataFrame([
        {
            "question": "Which party will win the most seats at the next UK general election?",
            "resolution_criteria": "Party with the most seats in the House of Commons after the next general election.",
            "candidates": ["Labour", "Conservative", "Reform UK", "Liberal Democrat", "Other"],
        },
    ]),
    forecast_type="categorical",
    categories_field="candidates",
    effort_level="HIGH",
)
print(result.data[["probabilities", "rationale"]])
Column Type Description
probabilities str JSON object mapping each outcome to its probability (0 to 100). Mutually exclusive and exhaustive: the values sum to 100.
rationale str One joint rationale covering all outcomes, with citations

Each row's categories_field column must hold 2 to 50 unique options. Categorical is HIGH effort only.

Schema: engine/services/forecast/data_types.py (build_grouped_response_schema).

Thresholded

For a single uncertain quantity, forecast the probability that it clears each of several thresholds. Each row names an input column (via thresholds_field) holding its conditions as a JSON array of numbers or strings, ordered from least strict to most strict. The conditions are nested, so the probabilities are non-increasing down the list.

result = await forecast(
    input=DataFrame([
        {
            "question": "What will the price of Brent crude oil be on December 31, 2026?",
            "resolution_criteria": "Closing spot price of Brent crude oil (ICE) on Dec 31, 2026.",
            "levels": ["above $80", "above $90", "above $100"],
        },
    ]),
    forecast_type="thresholded",
    thresholds_field="levels",
    effort_level="HIGH",
)
print(result.data[["probabilities", "rationale"]])
Column Type Description
probabilities str JSON object mapping each condition to its probability (0 to 100). Non-increasing across the listed order, since each condition is stricter than the last.
rationale str One joint rationale covering all thresholds, with citations

Each row's thresholds_field column must hold 2 to 50 unique conditions in least-strict-to-most-strict order; the engine treats the labels as opaque and cannot reorder them for you. Thresholded is HIGH effort only.

Schema: engine/services/forecast/data_types.py (build_grouped_response_schema).

Batch context

When all rows share common framing, pass it via context instead of repeating it in every row:

result = await forecast(
    input=geopolitics_questions,
    forecast_type="binary",
    context="Focus on EU regulatory and diplomatic sources. Assume all questions resolve by end of 2027.",
)

Leave context empty when rows are self-contained. A well-specified question with resolution criteria needs no additional instruction.

Input columns

The input DataFrame should contain at minimum a question column. All columns are passed to the research agents and forecasters.

When forecasting prediction-market or contest questions, pass in the relevant fields from that platform (resolution criteria, close date, creation date, and current price) to ensure a high-quality forecast. See Prediction-market questions below for the per-platform field mappings, API endpoints, and a worked example.

Column Required Purpose
question Yes The question to forecast
resolution_criteria Recommended Exactly how the outcome is determined, verbatim from the source when one exists: Polymarket description, Kalshi rules_primary + rules_secondary, Metaculus resolution_criteria + fine_print
resolution_date Optional When the question closes or resolves: Polymarket endDate, Kalshi close_time, Metaculus scheduled_close_time
background Optional Additional context the forecasters should know
market_creation_date Recommended for prediction markets When the market or question was created: Polymarket createdAt, Kalshi created_time, Metaculus created_at
market_price Recommended for prediction markets The current market price or community forecast, with its as-of date: Polymarket outcomePrices, Kalshi yes_bid/yes_ask mid, Metaculus community prediction

Column names are not enforced. Research agents infer meaning from content, so a column named scenario instead of question works fine.

Self-contained questions need none of the optional columns; {"question": "When will Anthropic IPO?"} is a perfectly good row. The optional columns matter when the question has an external source of truth; include them whenever they exist.

Prediction-market questions (Polymarket, Kalshi, Metaculus)

When a question lives on a prediction market or forecasting platform, forecast quality depends on passing the market's own definition of the question. Fetch the market from the platform API and pass its fields through verbatim, without paraphrasing the resolution criteria. The clauses that look like boilerplate (official data source, delay handling, early-close conditions) are often the ones that decide the outcome.

Input column Polymarket (Gamma API) Kalshi (events API) Metaculus (API token required)
question market.question event title + child market yes_sub_title title
resolution_criteria market.description (already includes the fine print) rules_primary, plus rules_secondary when non-empty resolution_criteria + fine_print
resolution_date endDate close_time scheduled_close_time
background event title + event description event title, sub_title, category description
market_creation_date createdAt created_time created_at
market_price first element of outcomePrices, or the bestBid/bestAsk mid mid of yes_bid_dollars/yes_ask_dollars (dollar strings; last_price_dollars goes stale on thin markets) community prediction

Endpoints:

  • Polymarket: GET https://gamma-api.polymarket.com/events?slug={event-slug} returns the event with a nested markets[] array. No auth.
  • Kalshi: GET https://external-api.kalshi.com/trade-api/v2/events/{EVENT_TICKER}?with_nested_markets=true, where the event ticker is the last URL path segment, uppercased. No auth.
  • Metaculus: GET https://www.metaculus.com/api/posts/{id}/ with an Authorization: Token <token> header (free account required; unauthenticated requests return 403).

Most Polymarket and Kalshi event URLs map to many child markets (one per candidate, threshold, or date bucket), so make sure you build a row from the specific child market you mean to forecast.

import json
from datetime import date

import httpx
from pandas import DataFrame
from futuresearch.ops import forecast

event = httpx.get(
    "https://gamma-api.polymarket.com/events",
    params={"slug": "world-cup-winner"},
).json()[0]

market = event["markets"][0]  # pick the child market you want
yes_price = json.loads(market["outcomePrices"])[0]

questions = DataFrame([{
    "question": market["question"],
    "resolution_criteria": market["description"],  # verbatim, don't paraphrase
    "resolution_date": market["endDate"],
    "background": event["title"],
    "market_creation_date": market["createdAt"],
    "market_price": f"{yes_price} (Yes, as of {date.today()})",
}])

result = await forecast(input=questions, forecast_type="binary")

For screening many markets at once, see Find Profitable Prediction Market Trades.

Parameters

Name Type Description
input DataFrame Rows to forecast, one question per row
forecast_type "binary" | "numeric" | "date" | "categorical" | "thresholded" Type of forecast to produce
effort_level "LOW" | "HIGH" | None See Effort and cost below. Defaults to None (auto-resolved by row count). categorical and thresholded require "HIGH".
context str | None Optional batch-level instructions that apply to every row
output_field str | None Name of the quantity being forecast (required for numeric and date, e.g. "price", "launch_date")
units str | None Units for the forecast (required for numeric, e.g. "USD per barrel", "billions USD")
categories_field str | None Name of the input column holding each row's outcomes as a JSON array of strings, 2 to 50 unique options (required for categorical)
thresholds_field str | None Name of the input column holding each row's threshold conditions as a JSON array of numbers or strings, least strict to most strict (required for thresholded)
session Session Optional, auto-created if omitted

The forecast_type enum is defined in engine/services/forecast/data_types.py (ForecastType.BINARY | NUMERIC | DATE | CATEGORICAL | THRESHOLDED); effort_level in the same file (ForecastEffortLevel.LOW | HIGH).

Effort and cost

effort_level trades cost for accuracy:

Effort Per-row time Per-row cost
LOW (default for batches) ~3 to 5 min $0.09 to $0.20
HIGH (default for single) ~5 to 10 min ~$1.20

Default effort resolves automatically: HIGH for a single forecast, LOW for many. When effort_level=None, the engine uses HIGH for row_count <= 1 and LOW otherwise (engine/services/forecast/effort.py:18-27). One-off questions get accurate forecasting; large batches stay affordable. See /forecast for worked examples.

Via MCP

MCP tool: futuresearch_forecast

Parameter Type Description
data list[object] Inline data as a list of row objects
artifact_id string Alternatively, an artifact ID from a previous upload
forecast_type "binary" | "numeric" | "date" | "categorical" | "thresholded" Type of forecast to produce
effort_level "LOW" | "HIGH" Optional. Defaults: HIGH for a single question, LOW for multiple. categorical and thresholded require HIGH.
context string Optional batch-level context for all questions
output_field string Name of the quantity (required for numeric and date)
units string Units (required for numeric)
categories_field string Name of the column holding each row's outcomes as a JSON array of strings (required for categorical)
thresholds_field string Name of the column holding each row's threshold conditions as a JSON array, least strict to most strict (required for thresholded)

Provide either data or artifact_id, not both. See the MCP server reference for the rest of the lifecycle (progress, results, status).

Related docs

Guides

  • Turn Claude into an Accurate Forecaster: binary, numeric, and date forecasting for any question.
  • Find Profitable Prediction Market Trades: Polymarket and Kalshi screening.
  • Forecast Outcomes for a List of Entities: one outcome per row across a list.
  • Value a Private Company: sum-of-the-parts forecasting.

Case studies

  • Forecast When Anthropic and OpenAI Will IPO: date mode.
  • Forecast Anthropic and OpenAI IPO Valuations: numeric, high effort.
  • Forecast a SpaceX Sum-of-the-Parts Valuation: numeric, multi-segment.
  • Forecast AI Researcher Seed Valuations: numeric across 116 entities.

Long-form research

  • Anthropic and OpenAI IPO Timelines and Valuations
  • A $1.75 Trillion IPO Would Be Overpaying 30% for SpaceX
  • Which AI Researchers Have the Most Valuable Skills?
  • Forecasting Polymarket Questions with AI
  • Strategic Reasoning paper (arXiv:2604.26106)