FutureSearch Logofuturesearch
  • Blog
  • Solutions
  • Research
  • Docs
  • Evals
  • Company
  • Get Researchers
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHub

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Cowork
  • Claude Code
  • Web App
  • Python SDK
  • Skill
  • MCP Server
Reference
  • API Key
  • classify
  • dedupe
  • forecast
  • merge
  • rank
  • agent_map
  • screen
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Filter a Dataset Intelligently
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Fuzzy Match Across Tables

Go to futuresearch.ai/app, upload company_info.csv and valuations.csv, and enter:

Merge company_info.csv with valuations.csv. The first table has company names, the second has stock tickers. Match companies to their stock tickers.

438 rows matched with 100% accuracy using a cascade of exact, fuzzy, LLM, and web search strategies.

Add the everyrow connector if you haven't already. Then upload company_info.csv and valuations.csv and ask Claude:

Merge company_info.csv with valuations.csv. The first table has company names, the second has stock tickers. Match companies to their stock tickers.

438 rows matched with 100% accuracy using a cascade of exact, fuzzy, LLM, and web search strategies.

Claude Code handles exact-key merges natively by writing pandas code. Scaling to fuzzy matching, then semantic matching, then web search fallback needs an approach where each strategy is tested independently and the cascade is evaluated empirically.

Here, we run 5 merge experiments on 438 S&P 500 companies, testing the cascade from exact matching to web search.

MetricValue
Total merges5
Rows per merge438
Total cost$3.67
Total time7.1 minutes

Add everyrow to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

With the company CSVs in your working directory, tell Claude to run each experiment. For the company-to-ticker merge:

Merge company_info.csv with valuations.csv. The first table has company names,
the second has stock tickers. Match companies to their stock tickers.

Claude calls everyrow's merge MCP tool:

Tool: everyrow_merge
├─ task: "Merge the tables based on company name and ticker"
├─ left_csv: "/Users/you/company_info.csv"
└─ right_csv: "/Users/you/valuations.csv"

→ Submitted: 438 rows for merging.
  Session: https://futuresearch.ai/sessions/d7819b7e-c48d-49e5-9f6e-55d972b85467

...

Tool: everyrow_results
→ Saved 438 rows to /Users/you/merged.csv

Results across all 5 experiments:

ExperimentAccuracyCostTime
0% noise (baseline)100%$0.006s
5% character corruption100%$0.1023s
10% character corruption100%$0.3443s
Company name to ticker (LLM)100%$1.01203s
CEO name to company (Web)96.3%$2.22151s

The cascade escalates automatically: exact matches are free, fuzzy matches handle typos for free, LLM reasoning handles semantic matches at ~$0.002/row, and web search is used only for stale or obscure data at ~$0.01/row.

The everyrow SDK implements a merge cascade (Exact, Fuzzy, LLM, Web) that automatically uses the simplest method that works for each row. This notebook tests the cascade across 5 experiments with increasing difficulty.

MetricValue
Total merges5
Rows per merge438
pip install everyrow
export EVERYROW_API_KEY=your_key_here  # Get one at futuresearch.ai/api-key
import asyncio
import pandas as pd
from everyrow import create_session
from everyrow.ops import merge

companies = pd.read_csv("company_info.csv")
valuations = pd.read_csv("valuations.csv")

async def main():
    # Experiment 1: Clean data (exact matches)
    async with create_session(name="Exact Match") as session:
        result = await merge(
            session=session,
            task="Merge the tables on company name",
            left_table=companies,
            right_table=valuations,
            merge_on_left="company",
            merge_on_right="company",
        )

    # Experiment 2: Company name to ticker (LLM match)
    async with create_session(name="LLM Match") as session:
        result = await merge(
            session=session,
            task="Merge the tables based on company name and ticker",
            left_table=companies,
            right_table=valuations,
        )

asyncio.run(main())
ExperimentMatchedAccuracyCost
0% noise100%100%$0.13
5% noise100%100%$0.32
10% noise100%100%$0.44
LLM (company to ticker)100%100%$1.00
Web (CEO matching)95.7%96.7%$3.69

The cascade optimizes cost automatically. For the 10% noise experiment, 26.5% of rows matched exactly, 30.8% via fuzzy matching (both free), and only 42.7% required LLM reasoning.