FutureSearch Logofuturesearch
  • Solutions
  • Pricing
  • Research
  • Docs
  • Evals
  • Blog
  • Company
  • LiteLLM Checker
  • Get Researchers
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHubSupport

Integrations

Claude CodeCursorChatGPT CodexClaude.ai

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Cowork
  • Claude Code
  • Web App
  • Python SDK
  • Skill
  • MCP Server
Reference
  • API Key
  • classify
  • dedupe
  • forecast
  • merge
  • rank
  • agent_map
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Filter a Dataset Intelligently
  • Find Profitable Polymarket Trades
  • Forecast Outcomes for a List of Entities
  • Value a Private Company
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
  • Turn Claude into an Accurate Forecaster
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Merge Costs and Speed

Run 5 merge experiments to empirically measure the cost cascade across increasing match difficulty. Exact and fuzzy matches are free; only semantic matches that require LLM reasoning incur costs.

MetricValue
Total merges5
Total cost$0.06
Total time2.1 minutes

Add FutureSearch to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

Tell Claude to run each experiment with inline-generated data:

Create a test dataset of 10 companies with exact names, then merge them.
Then create a version with typos and merge again. Then test semantic
matching (Instagram to Meta, YouTube to Alphabet). Then test pharma
subsidiaries (Genentech to Roche, MSD to Merck). Show costs for each.

Claude runs five merge experiments:

Tool: futuresearch_merge (Experiment 1: Exact matches)
→ 10/10 matched, 6s, $0.00

Tool: futuresearch_merge (Experiment 2: Fuzzy/typo matches)
→ 10/10 matched, 13s, $0.00

Tool: futuresearch_merge (Experiment 3: Semantic matches)
→ 10/10 matched, 62s, $0.05

Tool: futuresearch_merge (Experiment 4: Pharma subsidiaries)
→ 13/13 matched, 38s, $0.01

Tool: futuresearch_merge (Experiment 5: Email domain matching)
→ 5/5 matched, 9s, $0.00

Add the FutureSearch connector if you haven't already. Then upload your company tables and ask Claude:

Create a test dataset of 10 companies with exact names, then merge them. Then create a version with typos and merge again. Then test semantic matching (Instagram to Meta, YouTube to Alphabet). Show costs for each.

Exact and fuzzy matches are free. Only rows requiring LLM reasoning cost ~$0.002/row.

Go to futuresearch.ai/app, upload your company tables, and enter:

Merge the tables based on company name and ticker. Match companies to their stock tickers.

Exact and fuzzy matches are free. Only rows requiring LLM reasoning cost ~$0.002/row. Web search fallback costs ~$0.01/row.

The FutureSearch SDK implements a cost-optimized merge cascade. This example empirically measures the cost of each matching strategy across 5 experiments.

pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here  # Get one at futuresearch.ai/app/api-key
import asyncio
import pandas as pd
from futuresearch import create_session, get_billing_balance
from futuresearch.ops import merge

async def measure_merge(name, task, left_table, right_table, **kwargs):
    balance_before = await get_billing_balance()
    async with create_session(name=name) as session:
        result = await merge(
            task=task,
            session=session,
            left_table=left_table,
            right_table=right_table,
            **kwargs,
        )
    balance_after = await get_billing_balance()
    cost = balance_before.current_balance_dollars - balance_after.current_balance_dollars
    return result.data, cost

# Exact matches: $0.00
result, cost = await measure_merge(
    "Exact matches only",
    "Match companies by name.",
    companies_exact, revenue_exact,
    merge_on_left="company", merge_on_right="company_name",
)

# Semantic matches: ~$0.03
result, cost = await measure_merge(
    "Semantic matches",
    "Match companies. Instagram and WhatsApp are owned by Meta.",
    companies_semantic, revenue_exact,
    merge_on_left="company", merge_on_right="company_name",
)

Results

ExperimentMatch TypeCostAccuracy
Exact stringsExact only$0.00100%
Typos/caseExact + Fuzzy$0.00100%
Semantic (Instagram→Meta)Exact + LLM$0.05100%
Pharma (Genentech→Roche)Exact + Fuzzy + LLM$0.01100%
Email domainsLLM (domain)$0.00100%

The cascade strategy:

StrategyCostExample
Exact matchFree"Apple Inc" to "Apple Inc"
Fuzzy matchFree"Microsft" to "Microsoft"
LLM reasoning~$0.002/row"Instagram" to "Meta Platforms"
Web search~$0.01/rowObscure or stale data

Key finding: exact and fuzzy matches are free. Only rows requiring LLM reasoning incur costs. Providing merge_on hints reduces costs by helping the cascade skip LLM reasoning for more rows.