Merge Thousands of Records
Matching 2,246 people to their personal websites requires understanding names, affiliations, and URL patterns at a scale where each match may need web research to verify. This case study demonstrates semantic record matching at production scale.
| Metric | Value |
|---|---|
| Rows processed | 2,246 |
| Matched | 2,243 (99.9%) |
| Total cost | $35.41 |
| Time | 12.5 minutes |
Add FutureSearch to Claude Code if you haven't already:
claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp
With both CSVs in your working directory, tell Claude:
Merge the people CSV with the websites CSV. Match each person to their
personal website(s).
Claude calls FutureSearch's merge MCP tool:
Tool: futuresearch_merge
├─ task: "Match each person to their website(s)."
├─ left_csv: "/Users/you/people.csv"
└─ right_csv: "/Users/you/websites.csv"
→ Submitted: 2,246 rows for merging.
Session: https://futuresearch.ai/sessions/2a929529-2d92-4410-a6a7-ce8713c5d465
Task ID: 2a92...
Tool: futuresearch_progress
├─ task_id: "2a92..."
→ Running: 0/2246 complete (30s elapsed)
...
Tool: futuresearch_progress
→ Completed: 2246/2246 (0 failed) in 747s.
Tool: futuresearch_results
├─ task_id: "2a92..."
├─ output_path: "/Users/you/people_with_websites.csv"
→ Saved 2246 rows to /Users/you/people_with_websites.csv
2,243 of 2,246 matched (99.9%). View the session.
Add the FutureSearch connector if you haven't already. Then upload both the people CSV and websites CSV and ask Claude:
Merge the people CSV with the websites CSV. Match each person to their personal website(s).
Go to futuresearch.ai/app, upload both the people CSV and websites CSV, and enter:
Merge the people CSV with the websites CSV. Match each person to their personal website(s).
pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here # Get one at futuresearch.ai/app/api-key
import asyncio
import pandas as pd
from futuresearch import create_session
from futuresearch.ops import merge
left_df = pd.read_csv("merge_websites_input_left_2246.csv")
right_df = pd.read_csv("merge_websites_input_right_2246.csv")
async def main():
async with create_session(name="Website Matching") as session:
result = await merge(
session=session,
task="Match each person to their website(s).",
left_table=left_df,
right_table=right_df,
)
return result.data
merged = asyncio.run(main())
Results
Most matches resolved via LLM reasoning on name/email/URL patterns. Harder cases triggered automatic web search to verify person-to-website relationships. At this scale, 54M tokens were consumed across 4,233 LLM requests.
Cost grows super-linearly with row count because each additional row increases the candidate pool for every match:
| Rows | Cost |
|---|---|
| 100 | $0.00 |
| 200 | $0.14 |
| 400 | $0.29 |
| 800 | $2.32 |
| 1,600 | $16.60 |
| 2,246 | $26.80 |