FutureSearch Logofuturesearch
  • Blog
  • Solutions
  • Research
  • Docs
  • Evals
  • Company
  • Get Researchers
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHub

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Cowork
  • Claude Code
  • Web App
  • Python SDK
  • Skill
  • MCP Server
Reference
  • API Key
  • classify
  • dedupe
  • forecast
  • merge
  • rank
  • agent_map
  • screen
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Filter a Dataset Intelligently
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Merge

merge left-joins two DataFrames using LLM-powered agents to resolve the key mapping instead of requiring exact or fuzzy key matches. Agents resolve semantic relationships by reasoning over the data and, when needed, searching the web for external information to establish matches: subsidiaries, regional names, abbreviations, and product-to-parent-company mappings.

Examples

from everyrow.ops import merge

result = await merge(
    task="Match each software product to its parent company",
    left_table=software_products,   # table being enriched — all rows kept
    right_table=approved_vendors,    # lookup/reference table — columns appended
    # merge_on_left/merge_on_right omitted: auto-detection handles most cases.
    # Only specify them when you expect exact string matches on specific columns
    # or want to draw agent attention to them.
)
print(result.data.head())

For ambiguous cases, add context. Here merge_on_left/merge_on_right are set because the column names ("sponsor", "company") are too generic for auto-detection:

result = await merge(
    task="""
        Match clinical trial sponsors to parent pharma companies.

        Watch for:
        - Subsidiaries (Genentech → Roche, Janssen → J&J)
        - Regional names (MSD is Merck outside the US)
        - Abbreviations (BMS → Bristol-Myers Squibb)
    """,
    left_table=trials,              # table being enriched — all rows kept
    right_table=pharma_companies,   # lookup table
    merge_on_left="sponsor",        # specified: draws agent attention to this column
    merge_on_right="company",       # specified: draws agent attention to this column
)
print(result.data.head())

What you get back

A DataFrame with all left table columns plus matched right table columns. Rows that don't match get nulls for the right columns (like a left join).

Parameters

Name Type Description
task str How to match the tables
left_table DataFrame The table being enriched — all its rows are kept in the output (LEFT JOIN).
right_table DataFrame The lookup/reference table — its columns are appended to matches; unmatched left rows get nulls.
merge_on_left Optional[str] Only set if you expect exact string matches on this column or want to draw agent attention to it. Auto-detected if omitted.
merge_on_right Optional[str] Only set if you expect exact string matches on this column or want to draw agent attention to it. Auto-detected if omitted.
relationship_type Optional[str] "many_to_one" (default) — multiple left rows can match one right row. "one_to_one" — unique matching between left and right rows. "one_to_many" — one left row can match multiple right rows. "many_to_many" — multiple left rows can match multiple right rows. For one_to_many and many_to_many, multiple matches are joined with " | " in each added column.
use_web_search Optional[str] "auto" (default), "yes", or "no". Controls whether agents use web search to resolve matches.
session Session Optional, auto-created if omitted

Performance

Size Time Cost
100 × 50 ~3 min ~$2
2,000 × 50 ~8 min ~$9
1,000 × 1,000 ~12 min ~$15

Via MCP

MCP tool: everyrow_merge

Parameter Type Description
left_csv_path string Path to the table being enriched (left join)
right_csv_path string Path to the lookup/reference table
task string How to match rows across tables

Related docs

Guides

  • Fuzzy Join Without Matching Keys

Case Studies

  • LLM Merging at Scale
  • Match Software Vendors to Requirements
  • Merge Contacts with Company Data
  • Merge Overlapping Contact Lists

Blog posts

  • Software Supplier Matching
  • HubSpot Contact Merge
  • CRM Merge Workflow