Merge Costs and Speed
Run 5 merge experiments to empirically measure the cost cascade across increasing match difficulty. Exact and fuzzy matches are free; only semantic matches that require LLM reasoning incur costs.
| Metric | Value |
|---|---|
| Total merges | 5 |
| Total cost | $0.06 |
| Total time | 2.1 minutes |
Add FutureSearch to Claude Code if you haven't already:
claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp
Tell Claude to run each experiment with inline-generated data:
Create a test dataset of 10 companies with exact names, then merge them.
Then create a version with typos and merge again. Then test semantic
matching (Instagram to Meta, YouTube to Alphabet). Then test pharma
subsidiaries (Genentech to Roche, MSD to Merck). Show costs for each.
Claude runs five merge experiments:
Tool: futuresearch_merge (Experiment 1: Exact matches)
→ 10/10 matched, 6s, $0.00
Tool: futuresearch_merge (Experiment 2: Fuzzy/typo matches)
→ 10/10 matched, 13s, $0.00
Tool: futuresearch_merge (Experiment 3: Semantic matches)
→ 10/10 matched, 62s, $0.05
Tool: futuresearch_merge (Experiment 4: Pharma subsidiaries)
→ 13/13 matched, 38s, $0.01
Tool: futuresearch_merge (Experiment 5: Email domain matching)
→ 5/5 matched, 9s, $0.00
Add the FutureSearch connector if you haven't already. Then upload your company tables and ask Claude:
Create a test dataset of 10 companies with exact names, then merge them. Then create a version with typos and merge again. Then test semantic matching (Instagram to Meta, YouTube to Alphabet). Show costs for each.
Exact and fuzzy matches are free. Only rows requiring LLM reasoning cost ~$0.002/row.
Go to futuresearch.ai/app, upload your company tables, and enter:
Merge the tables based on company name and ticker. Match companies to their stock tickers.
Exact and fuzzy matches are free. Only rows requiring LLM reasoning cost ~$0.002/row. Web search fallback costs ~$0.01/row.
The FutureSearch SDK implements a cost-optimized merge cascade. This example empirically measures the cost of each matching strategy across 5 experiments.
pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here # Get one at futuresearch.ai/app/api-key
import asyncio
import pandas as pd
from futuresearch import create_session, get_billing_balance
from futuresearch.ops import merge
async def measure_merge(name, task, left_table, right_table, **kwargs):
balance_before = await get_billing_balance()
async with create_session(name=name) as session:
result = await merge(
task=task,
session=session,
left_table=left_table,
right_table=right_table,
**kwargs,
)
balance_after = await get_billing_balance()
cost = balance_before.current_balance_dollars - balance_after.current_balance_dollars
return result.data, cost
# Exact matches: $0.00
result, cost = await measure_merge(
"Exact matches only",
"Match companies by name.",
companies_exact, revenue_exact,
merge_on_left="company", merge_on_right="company_name",
)
# Semantic matches: ~$0.03
result, cost = await measure_merge(
"Semantic matches",
"Match companies. Instagram and WhatsApp are owned by Meta.",
companies_semantic, revenue_exact,
merge_on_left="company", merge_on_right="company_name",
)
Results
| Experiment | Match Type | Cost | Accuracy |
|---|---|---|---|
| Exact strings | Exact only | $0.00 | 100% |
| Typos/case | Exact + Fuzzy | $0.00 | 100% |
| Semantic (Instagram→Meta) | Exact + LLM | $0.05 | 100% |
| Pharma (Genentech→Roche) | Exact + Fuzzy + LLM | $0.01 | 100% |
| Email domains | LLM (domain) | $0.00 | 100% |
The cascade strategy:
| Strategy | Cost | Example |
|---|---|---|
| Exact match | Free | "Apple Inc" to "Apple Inc" |
| Fuzzy match | Free | "Microsft" to "Microsoft" |
| LLM reasoning | ~$0.002/row | "Instagram" to "Meta Platforms" |
| Web search | ~$0.01/row | Obscure or stale data |
Key finding: exact and fuzzy matches are free. Only rows requiring LLM reasoning incur costs. Providing merge_on hints reduces costs by helping the cascade skip LLM reasoning for more rows.