Fuzzy Match Across Tables
Matching companies to stock tickers, or CEOs to their companies, requires a cascade of strategies from exact matching through LLM reasoning to web search. This case study runs 5 merge experiments on 438 S&P 500 companies, testing each strategy independently.
| Metric | Value |
|---|---|
| Total merges | 5 |
| Rows per merge | 438 |
| Total cost | $3.67 |
| Total time | 7.1 minutes |
Add FutureSearch to Claude Code if you haven't already:
claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp
With the company CSVs in your working directory, tell Claude to run each experiment. For the company-to-ticker merge:
Merge company_info.csv with valuations.csv. The first table has company names,
the second has stock tickers. Match companies to their stock tickers.
Claude calls FutureSearch's merge MCP tool:
Tool: futuresearch_merge
├─ task: "Merge the tables based on company name and ticker"
├─ left_csv: "/Users/you/company_info.csv"
└─ right_csv: "/Users/you/valuations.csv"
→ Submitted: 438 rows for merging.
Session: https://futuresearch.ai/sessions/d7819b7e-c48d-49e5-9f6e-55d972b85467
...
Tool: futuresearch_results
→ Saved 438 rows to /Users/you/merged.csv
Add the FutureSearch connector if you haven't already. Then upload company_info.csv and valuations.csv and ask Claude:
Merge company_info.csv with valuations.csv. The first table has company names, the second has stock tickers. Match companies to their stock tickers.
Go to futuresearch.ai/app, upload company_info.csv and valuations.csv, and enter:
Merge company_info.csv with valuations.csv. The first table has company names, the second has stock tickers. Match companies to their stock tickers.
pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here # Get one at futuresearch.ai/app/api-key
import asyncio
import pandas as pd
from futuresearch import create_session
from futuresearch.ops import merge
companies = pd.read_csv("company_info.csv")
valuations = pd.read_csv("valuations.csv")
async def main():
# Experiment 1: Clean data (exact matches)
async with create_session(name="Exact Match") as session:
result = await merge(
session=session,
task="Merge the tables on company name",
left_table=companies,
right_table=valuations,
merge_on_left="company",
merge_on_right="company",
)
# Experiment 2: Company name to ticker (LLM match)
async with create_session(name="LLM Match") as session:
result = await merge(
session=session,
task="Merge the tables based on company name and ticker",
left_table=companies,
right_table=valuations,
)
asyncio.run(main())
Results
Results across all 5 experiments:
| Experiment | Accuracy | Cost | Time |
|---|---|---|---|
| 0% noise (baseline) | 100% | $0.00 | 6s |
| 5% character corruption | 100% | $0.10 | 23s |
| 10% character corruption | 100% | $0.34 | 43s |
| Company name to ticker (LLM) | 100% | $1.01 | 203s |
| CEO name to company (Web) | 96.3% | $2.22 | 151s |
The cascade escalates automatically: exact matches are free, fuzzy matches handle typos for free, LLM reasoning handles semantic matches at ~$0.002/row, and web search is used only for stale or obscure data at ~$0.01/row. For the 10% noise experiment, 26.5% of rows matched exactly, 30.8% via fuzzy matching (both free), and only 42.7% required LLM reasoning.