FutureSearch Logofuturesearch
  • Solutions
  • Pricing
  • Research
  • Docs
  • Evals
  • Blog
  • Company
  • LiteLLM Checker
  • Get Researchers
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHubSupport

Integrations

Claude CodeCursorChatGPT CodexClaude.ai

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Cowork
  • Claude Code
  • Web App
  • Python SDK
  • Skill
  • MCP Server
Reference
  • API Key
  • classify
  • dedupe
  • forecast
  • merge
  • rank
  • agent_map
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Filter a Dataset Intelligently
  • Find Profitable Polymarket Trades
  • Forecast Outcomes for a List of Entities
  • Value a Private Company
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
  • Turn Claude into an Accurate Forecaster
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Fuzzy Match Across Tables

Matching companies to stock tickers, or CEOs to their companies, requires a cascade of strategies from exact matching through LLM reasoning to web search. This case study runs 5 merge experiments on 438 S&P 500 companies, testing each strategy independently.

MetricValue
Total merges5
Rows per merge438
Total cost$3.67
Total time7.1 minutes

Add FutureSearch to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

With the company CSVs in your working directory, tell Claude to run each experiment. For the company-to-ticker merge:

Merge company_info.csv with valuations.csv. The first table has company names,
the second has stock tickers. Match companies to their stock tickers.

Claude calls FutureSearch's merge MCP tool:

Tool: futuresearch_merge
├─ task: "Merge the tables based on company name and ticker"
├─ left_csv: "/Users/you/company_info.csv"
└─ right_csv: "/Users/you/valuations.csv"

→ Submitted: 438 rows for merging.
  Session: https://futuresearch.ai/sessions/d7819b7e-c48d-49e5-9f6e-55d972b85467

...

Tool: futuresearch_results
→ Saved 438 rows to /Users/you/merged.csv

Add the FutureSearch connector if you haven't already. Then upload company_info.csv and valuations.csv and ask Claude:

Merge company_info.csv with valuations.csv. The first table has company names, the second has stock tickers. Match companies to their stock tickers.

Go to futuresearch.ai/app, upload company_info.csv and valuations.csv, and enter:

Merge company_info.csv with valuations.csv. The first table has company names, the second has stock tickers. Match companies to their stock tickers.

pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here  # Get one at futuresearch.ai/app/api-key
import asyncio
import pandas as pd
from futuresearch import create_session
from futuresearch.ops import merge

companies = pd.read_csv("company_info.csv")
valuations = pd.read_csv("valuations.csv")

async def main():
    # Experiment 1: Clean data (exact matches)
    async with create_session(name="Exact Match") as session:
        result = await merge(
            session=session,
            task="Merge the tables on company name",
            left_table=companies,
            right_table=valuations,
            merge_on_left="company",
            merge_on_right="company",
        )

    # Experiment 2: Company name to ticker (LLM match)
    async with create_session(name="LLM Match") as session:
        result = await merge(
            session=session,
            task="Merge the tables based on company name and ticker",
            left_table=companies,
            right_table=valuations,
        )

asyncio.run(main())

Results

Results across all 5 experiments:

ExperimentAccuracyCostTime
0% noise (baseline)100%$0.006s
5% character corruption100%$0.1023s
10% character corruption100%$0.3443s
Company name to ticker (LLM)100%$1.01203s
CEO name to company (Web)96.3%$2.22151s

The cascade escalates automatically: exact matches are free, fuzzy matches handle typos for free, LLM reasoning handles semantic matches at ~$0.002/row, and web search is used only for stale or obscure data at ~$0.01/row. For the 10% noise experiment, 26.5% of rows matched exactly, 30.8% via fuzzy matching (both free), and only 42.7% required LLM reasoning.