FutureSearch Logofuturesearch
  • Solutions
  • Pricing
  • Research
  • Docs
  • Evals
  • Markets
  • Blog
  • Company
  • Try it for free
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHubSupport

Integrations

Claude CodeCursorChatGPT CodexClaude.ai

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Code
  • Web App
  • Python SDK
  • Skill
Reference
  • API Key
  • forecast
  • multi_agent
  • agent_map
  • rank
  • classify
  • merge
  • dedupe
  • MCP Server
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Error Handling in FutureSearch: Failed Rows and Partial Results
  • Filter a Dataset Intelligently
  • Find Profitable Prediction Market Trades
  • Forecast Outcomes for a List of Entities
  • Value a Private Company
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Research a Question with a Team of Agents
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
  • Turn Claude into an Accurate Forecaster
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Find Startups Selling to Frontier AI Labs
  • Forecast a Sum-of-the-Parts SpaceX IPO Valuation
  • Forecast Anthropic and OpenAI IPO Valuations
  • Forecast Founder Seed Valuations for AI Researchers
  • Forecast When Anthropic and OpenAI Will IPO
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Research Formal Verification for AI
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Rank

rank takes a DataFrame and a natural-language scoring criterion, dispatches web research agents to compute a score for each row, and returns the DataFrame sorted by that score. The sort key does not need to exist in your data. Agents derive it at runtime by searching the web, reading pages, and reasoning over what they find.

Examples

from futuresearch.ops import rank

result = await rank(
    task="Score by likelihood to need data integration solutions",
    input=leads_dataframe,
    field_name="integration_need_score",
    ascending_order=False,  # highest first
)
print(result.data.head())

The task can be as specific as you want. You can describe the metric in detail, list which sources to use, and explain how to resolve ambiguities.

result = await rank(
    task="""
        Score 0-100 by likelihood to adopt research tools in the next 12 months.

        High scores: teams actively publishing, hiring researchers, or with
        recent funding for R&D. Low scores: pure trading shops, firms with
        no public research output.

        Consult the company's website, job postings, and LinkedIn profile for information.
    """,
    input=investment_firms,
    field_name="research_adoption_score",
    ascending_order=False,  # highest first
)
print(result.data.head())

Structured output

If you want more than just a number, pass a Pydantic model.

Note that you don't need specify fields for reasoning, explanation or sources. That information is included automatically.

from pydantic import BaseModel, Field

class AcquisitionScore(BaseModel):
    fit_score: float = Field(description="0-100, strategic alignment with our business")
    annual_revenue_usd: int = Field(description="Their estimated annual revenue in USD")

result = await rank(
    task="Score acquisition targets by product-market fit and revenue quality",
    input=potential_acquisitions,
    field_name="fit_score",
    response_model=AcquisitionScore,
    ascending_order=False,  # highest first
)
print(result.data.head())

Now every row has both fit_score and annual_revenue_usd fields, each of which includes its own explanation.

When specifying a response model, make sure that it contains field_name. Otherwise, you'll get an error. Also, the field_type parameter is ignored when you pass a response model.

What you get back

rank returns your DataFrame sorted by the score, with the score column added. The agent's reasoning and the web sources it used are attached automatically, so you never request them.

  • <field_name> — the score the agent computed for each row (the column you named, such as research_adoption_score)
  • the reasoning behind each score and the sources the agent read, included automatically
  • rows ordered by the score (ascending_order=False puts the highest first)

With a response_model, every field you define is added as its own column, and each carries its own reasoning and sources.

ranked = result.data        # already sorted by field_name
top_10 = result.data.head(10)

Example

Estimate residential building-permit processing time for a list of Texas cities, fastest first (from the permit-times case study).

Input:

city county population
Houston Harris 2,390,125
San Antonio Bexar 1,526,656
Dallas Dallas 1,326,087
Fort Worth Tarrant 1,008,106
Austin Travis 993,588

Output (sorted by estimated_days, ascending):

city estimated_days
San Antonio 2
Austin 5
Fort Worth 7
Houston 8
Dallas 8

estimated_days does not exist in the input. The agent derives it per row by researching each city's permitting process, and every row keeps the agent's reasoning and the sources behind its estimate.

Parameters

Name Type Description
task str The task for the agent describing how to find your metric
session Session Optional, auto-created if omitted
input DataFrame Your data
field_name str Column name for the metric
field_type str The type of the field (default: "float")
response_model BaseModel Optional response model for multiple output fields
ascending_order bool True = lowest first (default)
preview bool True = process only a few rows

Performance

Rows Time Cost
10 ~3 min ~$0.30 to $0.60
50 ~6 min ~$1.50 to $3
100 ~11 min ~$3 to $6

You are charged 3 to 6¢ per row (see pricing); the per-row cost rises with how much web research each row needs. Time is wall-clock, and because rows are researched concurrently it scales sublinearly, not in proportion to row count.

Via MCP

MCP tool: futuresearch_rank

Parameter Type Description
csv_path string Path to input CSV file
task string How to score each row
field_name string Column name for the score

Related docs

Guides

  • Sort a Dataset Using Web Data

Case Studies

  • Score Leads from Fragmented Data
  • Score Leads Without CRM History
  • Research and Rank Permit Times

Blog posts

  • Ranking by Data Fragmentation Risk
  • Rank Leads Like an Analyst