FutureSearch Logofuturesearch
  • Blog
  • Solutions
  • Research
  • Docs
  • Evals
  • Company
  • Get Researchers
FutureSearch Logo

General inquiry? You can reach us at hello@futuresearch.ai.

Company

Team & CareersPressPrivacy PolicyTerms of Service

Developers

SDK DocsAPI ReferenceCase StudiesGitHub

Follow Us

X (Twitter)@dschwarz26LinkedIn
FutureSearchdocs
Your research team
Installation
  • All install methods
  • Claude.ai
  • Claude Cowork
  • Claude Code
  • Web App
  • Python SDK
  • Skill
  • MCP Server
Reference
  • API Key
  • classify
  • dedupe
  • forecast
  • merge
  • rank
  • agent_map
  • screen
  • Progress Monitoring
  • Chaining Operations
Guides
  • LLM-Powered Data Labeling
  • Add a Column via Web Research
  • Classify and Label Rows
  • Deduplicate Training Data
  • Filter a Dataset Intelligently
  • Join Tables Without Shared Keys
  • Rank Data by External Metrics
  • Resolve Duplicate Entities
  • Scale Deduplication to 20K Rows
Case Studies
  • Deduplicate Contact Lists
  • Deduplicate CRM Records
  • Enrich Contacts with Company Data
  • Fuzzy Match Across Tables
  • Link Records Across Medical Datasets
  • LLM Cost vs. Accuracy
  • Merge Costs and Speed
  • Merge Thousands of Records
  • Multi-Stage Lead Qualification
  • Research and Rank Web Data
  • Run 10,000 LLM Web Research Agents
  • Score Cold Leads via Web Research
  • Score Leads from Fragmented Data
  • Screen 10,000 Rows
  • Screen Job Listings
  • Screen Stocks by Economic Sensitivity
  • Screen Stocks by Investment Thesis
FutureSearchby futuresearch
by futuresearch

Link Records Across Medical Datasets

Go to futuresearch.ai/app, upload papers_700.csv and trials_200.csv, and enter:

Match these PubMed papers to the clinical trials they report results for. A paper matches a trial if it describes the results of that trial. Look for matching interventions/drugs, conditions, study design, and outcomes. Drug names may appear as brand or generic. Not every paper has a matching trial.

73 paper-trial matches found with 84.7% F1 score. Results take about 7.5 minutes.

Add the everyrow connector if you haven't already. Then upload papers_700.csv and trials_200.csv and ask Claude:

Match these PubMed papers to the clinical trials they report results for. A paper matches a trial if it describes the results of that trial. Look for matching interventions/drugs, conditions, study design, and outcomes. Drug names may appear as brand or generic. Not every paper has a matching trial.

73 paper-trial matches found with 84.7% F1 score. Results take about 7.5 minutes.

Claude Code is great at reading a paper abstract and matching it to a clinical trial. When you have 700 papers and 200 trials, the matching requires evaluating thousands of potential pairs for drug aliases, rewritten trial titles, and study design terminology.

Here, we get Claude Code to match PubMed papers to the clinical trials they report results for.

MetricValue
Papers700
Trials200
Matched pairs73
F1 Score84.7%
Total cost$27.81
Time7.5 minutes

Add everyrow to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

With the papers and trials CSVs in your working directory, tell Claude:

Match these PubMed papers to the clinical trials they report results for.
A paper matches a trial if it describes the results of that trial. Look for
matching interventions/drugs, conditions, study design, and outcomes. Drug
names may appear as brand or generic. Not every paper has a matching trial.

Claude calls everyrow's merge MCP tool with many-to-one relationship:

Tool: everyrow_merge
├─ task: "Match publications to the clinical trial they report results for..."
├─ left_csv: "/Users/you/papers_700.csv"
├─ right_csv: "/Users/you/trials_200.csv"
└─ relationship_type: "many_to_one"

→ Submitted: 700 rows for merging.
  Session: https://futuresearch.ai/sessions/d02d59b7-29fd-4e23-b35c-38c6a9096c34
  Task ID: d02d...

Tool: everyrow_progress
→ Running: 0/700 complete (30s elapsed)

...

Tool: everyrow_progress
→ Completed: 700/700 (0 failed) in 448s.

Tool: everyrow_results
→ Saved 700 rows to /Users/you/matched_trials.csv

73 paper-trial matches found. View the session.

Scored against 64 gold-labeled pairs:

MetricValue
True positives58
False positives15
False negatives6
Precision79.5%
Recall90.6%
F1 Score84.7%

627 papers were correctly left unmatched (distractors with no corresponding trial). The many-to-one relationship correctly models that multiple papers can report results from the same trial.

The everyrow SDK's merge() handles semantic matching across medical terminology, drug aliases, and study design descriptions. This notebook demonstrates matching papers to clinical trials with gold-label evaluation.

MetricValue
Papers700
Trials200
F1 Score87.2%
Cost~$20
pip install everyrow
export EVERYROW_API_KEY=your_key_here  # Get one at futuresearch.ai/api-key
import asyncio
import pandas as pd
from everyrow import create_session
from everyrow.ops import merge

trials_df = pd.read_csv("trials_200.csv")
papers_df = pd.read_csv("papers_700.csv")

async def main():
    async with create_session(name="Clinical Trials to Papers") as session:
        result = await merge(
            session=session,
            task="""
                Match publications to the clinical trial they report results for.
                Look for matching interventions/drugs, conditions, study design,
                outcomes, and sponsor/institution. Drug names may appear as brand
                or generic. Not every paper has a matching trial.
            """,
            left_table=papers_df,
            right_table=trials_df,
        )
        return result.data

merged = asyncio.run(main())
matched = merged.dropna(subset=["nct_id"])
MetricEveryRowClaude Code Only
F1 Score87.2%74.5%
Precision84.1%100%
Recall90.6%~59%

EveryRow maintains accuracy as datasets grow by dynamically allocating more agents. Its higher recall (90.6% vs ~59%) comes from finding matches that require deeper semantic understanding of medical terminology.