Link Records Across Medical Datasets

Matching PubMed papers to the clinical trials they report results for requires evaluating thousands of potential pairs for drug aliases, rewritten trial titles, and study design terminology. This case study demonstrates semantic record linkage across medical databases.

Metric	Value
Papers	700
Trials	200
Matched pairs	73
F1 Score	84.7%
Total cost	$27.81
Time	7.5 minutes

Add FutureSearch to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

With the papers and trials CSVs in your working directory, tell Claude:

Match these PubMed papers to the clinical trials they report results for.
A paper matches a trial if it describes the results of that trial. Look for
matching interventions/drugs, conditions, study design, and outcomes. Drug
names may appear as brand or generic. Not every paper has a matching trial.

Claude calls FutureSearch's merge MCP tool with many-to-one relationship:

Tool: futuresearch_merge
├─ task: "Match publications to the clinical trial they report results for..."
├─ left_csv: "/Users/you/papers_700.csv"
├─ right_csv: "/Users/you/trials_200.csv"
└─ relationship_type: "many_to_one"

→ Submitted: 700 rows for merging.
  Session: https://futuresearch.ai/sessions/d02d59b7-29fd-4e23-b35c-38c6a9096c34
  Task ID: d02d...

Tool: futuresearch_progress
→ Running: 0/700 complete (30s elapsed)

...

Tool: futuresearch_progress
→ Completed: 700/700 (0 failed) in 448s.

Tool: futuresearch_results
→ Saved 700 rows to /Users/you/matched_trials.csv

73 paper-trial matches found. View the session.

Add the FutureSearch connector if you haven't already. Then upload papers_700.csv and trials_200.csv and ask Claude:

Match these PubMed papers to the clinical trials they report results for. A paper matches a trial if it describes the results of that trial. Look for matching interventions/drugs, conditions, study design, and outcomes. Drug names may appear as brand or generic. Not every paper has a matching trial.

Go to futuresearch.ai/app, upload papers_700.csv and trials_200.csv, and enter:

Match these PubMed papers to the clinical trials they report results for. A paper matches a trial if it describes the results of that trial. Look for matching interventions/drugs, conditions, study design, and outcomes. Drug names may appear as brand or generic. Not every paper has a matching trial.

pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here  # Get one at futuresearch.ai/app/api-key

import asyncio
import pandas as pd
from futuresearch import create_session
from futuresearch.ops import merge

trials_df = pd.read_csv("trials_200.csv")
papers_df = pd.read_csv("papers_700.csv")

async def main():
    async with create_session(name="Clinical Trials to Papers") as session:
        result = await merge(
            session=session,
            task="""
                Match publications to the clinical trial they report results for.
                Look for matching interventions/drugs, conditions, study design,
                outcomes, and sponsor/institution. Drug names may appear as brand
                or generic. Not every paper has a matching trial.
            """,
            left_table=papers_df,
            right_table=trials_df,
        )
        return result.data

merged = asyncio.run(main())
matched = merged.dropna(subset=["nct_id"])

Results

Scored against 64 gold-labeled pairs:

Metric	Value
True positives	58
False positives	15
False negatives	6
Precision	79.5%
Recall	90.6%
F1 Score	84.7%

627 papers were correctly left unmatched (distractors with no corresponding trial). The many-to-one relationship correctly models that multiple papers can report results from the same trial.

Metric	FutureSearch	Claude Code Only
F1 Score	87.2%	74.5%
Precision	84.1%	100%
Recall	90.6%	~59%

FutureSearch maintains accuracy as datasets grow by dynamically allocating more agents. Its higher recall (90.6% vs ~59%) comes from finding matches that require deeper semantic understanding of medical terminology.

Link Records Across Medical Datasets

Metric

Value

Papers

700

Trials

200

Matched pairs

F1 Score

84.7%

Total cost

$27.81

Time

7.5 minutes

Add FutureSearch to Claude Code if you haven't already:

claude mcp add futuresearch --scope project --transport http https://mcp.futuresearch.ai/mcp

With the papers and trials CSVs in your working directory, tell Claude:

Match these PubMed papers to the clinical trials they report results for.
A paper matches a trial if it describes the results of that trial. Look for
matching interventions/drugs, conditions, study design, and outcomes. Drug
names may appear as brand or generic. Not every paper has a matching trial.

Claude calls FutureSearch's merge MCP tool with many-to-one relationship:

Tool: futuresearch_merge
├─ task: "Match publications to the clinical trial they report results for..."
├─ left_csv: "/Users/you/papers_700.csv"
├─ right_csv: "/Users/you/trials_200.csv"
└─ relationship_type: "many_to_one"

→ Submitted: 700 rows for merging.
  Session: https://futuresearch.ai/sessions/d02d59b7-29fd-4e23-b35c-38c6a9096c34
  Task ID: d02d...

Tool: futuresearch_progress
→ Running: 0/700 complete (30s elapsed)

...

Tool: futuresearch_progress
→ Completed: 700/700 (0 failed) in 448s.

Tool: futuresearch_results
→ Saved 700 rows to /Users/you/matched_trials.csv

73 paper-trial matches found. View the session.

Add the FutureSearch connector if you haven't already. Then upload papers_700.csv and trials_200.csv and ask Claude:

Match these PubMed papers to the clinical trials they report results for. A paper matches a trial if it describes the results of that trial. Look for matching interventions/drugs, conditions, study design, and outcomes. Drug names may appear as brand or generic. Not every paper has a matching trial.

Go to futuresearch.ai/app, upload papers_700.csv and trials_200.csv, and enter:

Match these PubMed papers to the clinical trials they report results for. A paper matches a trial if it describes the results of that trial. Look for matching interventions/drugs, conditions, study design, and outcomes. Drug names may appear as brand or generic. Not every paper has a matching trial.

pip install futuresearch
export FUTURESEARCH_API_KEY=your_key_here  # Get one at futuresearch.ai/app/api-key

import asyncio
import pandas as pd
from futuresearch import create_session
from futuresearch.ops import merge

trials_df = pd.read_csv("trials_200.csv")
papers_df = pd.read_csv("papers_700.csv")

async def main():
    async with create_session(name="Clinical Trials to Papers") as session:
        result = await merge(
            session=session,
            task="""
                Match publications to the clinical trial they report results for.
                Look for matching interventions/drugs, conditions, study design,
                outcomes, and sponsor/institution. Drug names may appear as brand
                or generic. Not every paper has a matching trial.
            """,
            left_table=papers_df,
            right_table=trials_df,
        )
        return result.data

merged = asyncio.run(main())
matched = merged.dropna(subset=["nct_id"])

Results

Scored against 64 gold-labeled pairs:

Metric

Value

True positives

False positives

False negatives

Precision

79.5%

Recall

90.6%

F1 Score

84.7%

627 papers were correctly left unmatched (distractors with no corresponding trial). The many-to-one relationship correctly models that multiple papers can report results from the same trial.

Metric

FutureSearch

Claude Code Only

F1 Score

87.2%

74.5%

Precision

84.1%

100%

Recall

90.6%

~59%