FutureSearch Logofuturesearch
  • Blog
  • Solutions
  • Research
  • Docs
  • Evals
  • Company
  • Get Researchers
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHub

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Cowork
  • Claude Code
  • Web App
  • Python SDK
  • Skill
  • MCP Server
Reference
  • API Key
  • classify
  • dedupe
  • forecast
  • merge
  • rank
  • agent_map
  • screen
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Filter a Dataset Intelligently
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Deduplicate CRM Records

Go to futuresearch.ai/app, upload case_01_crm_data.csv, and enter:

Deduplicate this CRM dataset. Two entries are duplicates if they include data for the same legal entity.

500 records resolved to about 146 unique entities (70.8% duplicates removed). Results take about 7 minutes.

Add the everyrow connector if you haven't already. Then upload case_01_crm_data.csv and ask Claude:

Deduplicate this CRM dataset. Two entries are duplicates if they include data for the same legal entity.

500 records resolved to about 146 unique entities (70.8% duplicates removed). Results take about 7 minutes.

Claude Code can find exact duplicates. But what if "PANW", "Pallow Alto", and "Paloalto Networks" are all the same company? And "W-Mart", "Wall-Mart", and "WMT Corp" are all Walmart?

Here, we get Claude Code to deduplicate 500 messy CRM records down to unique companies.

MetricValue
Records processed500
Unique entities146
Duplicates removed354 (70.8%)
Cost$1.38
Time7.0 minutes

Add everyrow to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

Download case_01_crm_data.csv. Tell Claude:

Deduplicate this CRM dataset. Two entries are duplicates if they include data
for the same legal entity.

Claude calls everyrow's dedupe MCP tool:

Tool: everyrow_dedupe
├─ equivalence_relation: "Two entries are duplicates if they include data for the same legal entity."
└─ input_csv: "/Users/you/case_01_crm_data.csv"

→ Submitted: 500 rows for deduplication.
  Session: https://futuresearch.ai/sessions/0f6aa459-6e83-4df0-b9e8-bdb8ec594d91
  Task ID: 0f6a...

Tool: everyrow_progress
├─ task_id: "0f6a..."
→ Running: 0/500 complete (30s elapsed)

...

Tool: everyrow_progress
→ Completed: 500/500 (0 failed) in 422s.

Tool: everyrow_results
├─ task_id: "0f6a..."
├─ output_path: "/Users/you/crm_deduplicated.csv"
→ Saved 500 rows to /Users/you/crm_deduplicated.csv

500 records resolved to 146 unique entities. View the session.

ClusterRecordsVariants
Palo Alto Networks8Pallow Alto, PANW, Paloalto Networks, Palo Alto Net Inc
Walmart8W-Mart, Wall-Mart, WMT Corp, Wallmart, Wal-Mart Stores
Uber8Ubar, Ubr, Uber Tech, Uber Corporation
ServiceNow6Service Now, Service-Now, SerivceNow, Service Now Inc
Nike4Nyke, Nike Corp, Nike Incorporated, Nike Inc.

The output includes equivalence_class_id and selected columns. Filter to selected == True to get one record per entity. The system uses embeddings for initial clustering, then LLM pairwise comparison for accuracy.

The everyrow SDK's dedupe() resolves messy CRM records to unique entities using semantic matching.

MetricValue
Records processed500
Unique entities124
Cost$3.52
Time102 seconds
pip install everyrow
export EVERYROW_API_KEY=your_key_here  # Get one at futuresearch.ai/api-key
import asyncio
import pandas as pd
from everyrow import create_session
from everyrow.ops import dedupe

data = pd.read_csv("case_01_crm_data.csv")

async def main():
    async with create_session(name="CRM Deduplication") as session:
        result = await dedupe(
            session=session,
            input=data,
            equivalence_relation="Two entries are duplicates if they include data for the same legal entity.",
        )
        deduplicated = result.data[result.data["selected"]]
        return deduplicated

clean_data = asyncio.run(main())

500 records reduced to 124 unique companies. The output includes equivalence_class_id, equivalence_class_name, and selected columns. The system handles ticker symbols (PANW to Palo Alto Networks), nicknames (Big Blue to IBM), and typos (Wallmart to Walmart).