FutureSearch Logofuturesearch
  • Solutions
  • Pricing
  • Research
  • Docs
  • Evals
  • Blog
  • Company
  • LiteLLM Checker
  • Get Researchers
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHubSupport

Integrations

Claude CodeCursorChatGPT CodexClaude.ai

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Cowork
  • Claude Code
  • Web App
  • Python SDK
  • Skill
  • MCP Server
Reference
  • API Key
  • classify
  • dedupe
  • forecast
  • merge
  • rank
  • agent_map
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Filter a Dataset Intelligently
  • Find Profitable Polymarket Trades
  • Forecast Outcomes for a List of Entities
  • Value a Private Company
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
  • Turn Claude into an Accurate Forecaster
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Merge Thousands of Records

Matching 2,246 people to their personal websites requires understanding names, affiliations, and URL patterns at a scale where each match may need web research to verify. This case study demonstrates semantic record matching at production scale.

MetricValue
Rows processed2,246
Matched2,243 (99.9%)
Total cost$35.41
Time12.5 minutes

Add FutureSearch to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

With both CSVs in your working directory, tell Claude:

Merge the people CSV with the websites CSV. Match each person to their
personal website(s).

Claude calls FutureSearch's merge MCP tool:

Tool: futuresearch_merge
├─ task: "Match each person to their website(s)."
├─ left_csv: "/Users/you/people.csv"
└─ right_csv: "/Users/you/websites.csv"

→ Submitted: 2,246 rows for merging.
  Session: https://futuresearch.ai/sessions/2a929529-2d92-4410-a6a7-ce8713c5d465
  Task ID: 2a92...

Tool: futuresearch_progress
├─ task_id: "2a92..."
→ Running: 0/2246 complete (30s elapsed)

...

Tool: futuresearch_progress
→ Completed: 2246/2246 (0 failed) in 747s.

Tool: futuresearch_results
├─ task_id: "2a92..."
├─ output_path: "/Users/you/people_with_websites.csv"
→ Saved 2246 rows to /Users/you/people_with_websites.csv

2,243 of 2,246 matched (99.9%). View the session.

Add the FutureSearch connector if you haven't already. Then upload both the people CSV and websites CSV and ask Claude:

Merge the people CSV with the websites CSV. Match each person to their personal website(s).

Go to futuresearch.ai/app, upload both the people CSV and websites CSV, and enter:

Merge the people CSV with the websites CSV. Match each person to their personal website(s).

pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here  # Get one at futuresearch.ai/app/api-key
import asyncio
import pandas as pd
from futuresearch import create_session
from futuresearch.ops import merge

left_df = pd.read_csv("merge_websites_input_left_2246.csv")
right_df = pd.read_csv("merge_websites_input_right_2246.csv")

async def main():
    async with create_session(name="Website Matching") as session:
        result = await merge(
            session=session,
            task="Match each person to their website(s).",
            left_table=left_df,
            right_table=right_df,
        )
        return result.data

merged = asyncio.run(main())

Results

Most matches resolved via LLM reasoning on name/email/URL patterns. Harder cases triggered automatic web search to verify person-to-website relationships. At this scale, 54M tokens were consumed across 4,233 LLM requests.

Cost grows super-linearly with row count because each additional row increases the candidate pool for every match:

RowsCost
100$0.00
200$0.14
400$0.29
800$2.32
1,600$16.60
2,246$26.80