How LLM Agents Solve the Table Merging Problem

TLDR: VLOOKUP only works when values match exactly. Fuzzy matching helps with typos, but fails when "Microsoft Corporation" needs to match "MSFT" or "Apple Inc" needs to match "Apple Computer Company". This tutorial introduces deep merge by using AI to understand that two different strings refer to the same entity. Learn how to join datasets even when there's no common identifier, and names are spelled, abbreviated, or formatted completely differently.

I, and every data engineer, has tried to join two tables that don't have a shared key (also called entity resolution). Maybe it's companies, or people's names, or addresses, or parts. Those strings never match well enough, but pandas.merge() or VLOOKUP requires exact matches:

# This only works when "company" columns match exactly
merged = pd.merge(left_table, right_table, on="company")

Consider these variations that represent the same company:

Table A	Table B
Microsoft Corporation	Microsoft Corp.
Apple Inc.	AAPL
Facebook	Meta

A standard merge would miss all of these. So next I would try fuzzy merge libraries, like fuzzywuzzy or rapidfuzz, compute string similarity scores:

from rapidfuzz import fuzz

similarity = fuzz.ratio("Microsoft Corporation", "Microsoft Corp.")
# Returns ~85% similarity

This effectively calculates a distance between strings $s_1$ and $s_2$ (like the Levenshtein Distance), and normalizes it to get a similarity score between 0 and 1:

\text{similarity}(s_1, s_2) \approx 1 - \frac{d(s_1, s_2)}{\max(|s_1|, |s_2|)}

A similarity ≥ threshold qualifies as a fuzzy match.

But this won't work even on my 3 company example. There's no threshold that will match "Facebook" to "Meta" that also doesn't introduce huge false positives. If there even is an optimal threshold, it varies every time.

If you really want high accuracy, like I do, then world knowledge is sometimes required, like "McGraw Hill" ↔ "S&P Global".

LLMs make a much, much higher quality semantic matching possible. First, you can use embeddings, convert text into high-dimensional vectors where semantically similar entities cluster together. This works well for simple relationships, and the OpenAI embeddings API is extremely cheap. But still the quality isn't great, and, similarly to fuzzy string matching, you have to pick the right threshold.

Full-powered LLMs can fundamentally do entity resolution as well as humans. Furthermore, LLM web agents can do this as well as humans with access to Google. This is important if data is dynamic, ambiguous, or simply not (yet) existent in the weights / training data of the LLMs.

Playing around with LLM web agents, I immediately noticed that they are overkill for most rows in most datasets. I needed to use the simplest method that works for each row, so I implemented a hierarchical cascade, using more expensive ones only if it looks like cheaper ones would fail:

Exact string match: Basically a lookup-table
Fuzzy string match: Levenshtein-based with a high threshold of 0.9
LLM merge: A cheap LLM, run in batches, finds a match if it is confident
LLM-Web-Agent merge: Browse the web and try find what entities really correspond to

Cost And Accuracy of the Progressive Intelligence Approach

Let's look at a real-world example.

(The full code of this tutorial is available in the accompanying case study)

I used a dataset of 438 S&P 500 companies with columns for company name, ticker, fair value, price, market cap, shares, and CEO. This gave us ground truth for validation.

data = pd.read_csv("data/companies.csv")
# Columns: company, ticker, fair_value, price, mkt_cap, shares, CEO

For the different experiments, I split the data into two tables, corrupted the data, and removed columns. The dataset then also serves as a ground truth for the different experiments.

I ran three experiments, each testing a different merging challenge:

Fuzzy matching with noise from corrupt company names
LLM merge without common columns by matching companies to tickers
Web-based merge for association by matching CEOs to companies

Note this third one is not a typical join. CEOs and companies are not the same entity under different names, it's actually building a semantic relationship. LLMs are strictly needed for this.

My solution is now part of the everyrow.io SDK, so re-doing this experiment is pretty straightforward:

from everyrow import create_session
from everyrow.ops import merge
from everyrow.generated.models import LLMEnum

result = await merge(
        session=session,
        task="...", # Natural language description of what you're merging (helps the LLM, not necessary in general)
        left_table=left_df, # table to merge into as a pandas DataFrame
        right_table=right_df, # table to merge from as a pandas DataFrame
        merge_on_left="...",  # Optional: specify left merge column
        merge_on_right="...", # Optional: specify right merge column
) # result contains the dataframe as 'data' with merged tables and some metadata like the merge breakdown

The output table includes a research column with justifications for each match, allowing you to audit the matching logic.

You will also see a link to your session in the everyrow.io UI, where you can look at the merge process in real-time.

Experiment 1: Fuzzy String Matching Under Noise

In our first experiment, we progressively corrupt the data to simulate real-world data quality issues. We corrupted company names with a simple randomization function:

def randomize_string(text, p=0.1):
    """Replace each character with probability p"""
    result = []
    for char in text:
        if random.random() < p:
            result.append(random.choice(string.ascii_letters))
        else:
            result.append(char)
    return ''.join(result)

left_table = data.copy()
left_table.drop(columns=["fair_value"], inplace=True)
right_table = data[["company", "fair_value"]]
right_table = randomize_string_column(right_table, "company", p=0.1)

The noise p describes the probability that a character gets corrupter (removed or replaced). With p=0.1 (=10%), "Microsoft" might become "Micposoft" or "Micrsft".

Running the experiment for different noise levels, we pick these arguments:

result = await deep_merge(
    session=session,
    task="Merge the tables on company name",
    left_table=left_table,
    right_table=right_table,
    merge_on_left="company",
    merge_on_right="company",
)

Let us look at the distribution of matching methods, accuracy, and price across all rows.

Noise Level	Matched	Exact Match	Fuzzy Match	LLM Match	Web Match	Accuracy	Price
0%	100%	100%	0%	0%	0%	100%	$0.13
5%	100%	49.8%	30.6%	19.6%	0%	100%	$0.32
10%	100%	26.5%	30.8%	42.7%	0%	100%	$0.44

($0.13 for no LLMs is only because I have overhead in my tool that does use LLMs to set up the problem.) At 10% character corruption, exact string matching handles only 27% of rows. Fuzzy matching catches another 30%, but 43% require LLM matching to resolve.

Let us looks at some examples, where fuzzy matching failed but LLMs succeeded:

company name	company name (with noise)
Universal Health Services	Univebsal Health ServucQv
Boston Scientific	BostonNSchentifia
FactSet	FuctSet

As we see, for short company names ("FactSet"), a single wrong letter is enough to get below the fuzzy string threshold of 0.9.

Despite the heavy corruption, we achieved 100% accuracy with 0% false positives across all noise levels. The cascade automatically escalated to more sophisticated methods as needed, just costing more.

Link to public session (p=0)

Link to public session (p=0.05)

Link to public session (p=0.1)

Experiment 2: LLM Merge for Semantic Relationships

Here we look at a more complex scenario: both tables do not share a column / identifier and we have to match purely based on the table information.

Left table:

company	price	mkt_cap
Apple Inc.	24.71	541.49B
Microsoft	28.93	412.32B
Alphabet	31.22	298.17B
...	...	...

Right Table:

ticker	fair_value
AAPL	1593.98
MSFT	1392.16
GOOGL	1428.66
...	...

This is where semantic matching can shine. We can give the LLM a hint that the ticker is an important column to identify the company:

result = await merge(
    session=session,
    task="Merge the tables based on company name and ticker",
    left_table=left_table,
    right_table=right_table,
)

Here we left out the merge columns, which automatically skips the exact and the fuzzy string matching step. Instead the merge cascade goes straight to LLM reasoning and if that fails, it falls back to web verification.

Matched	Exact Match	Fuzzy Match	LLM Match	Web Match	Accuracy	False Positives	Price
100%	0%	0%	99.8%	0.2%	100%	0%	$1.00

437 of 438 rows matched using pure LLM reasoning. One edge case required web verification.

This should not surprise: company tickers are stable and LLMs have memorized them well.

Link to public session

Experiment 3: Web-based Merge for Dynamic Data

Let us know look at a case where data is dynamic and web verification is needed for several entities.

From our dataset, we matched only the CEO names to the companies.

Left table:

company	price	mkt_cap
Apple Inc.	24.71	541.49B
Microsoft	28.93	412.32B
Alphabet	31.22	298.17B
...	...	...

Right table:

CEO
Tim Cook
Satya Nadella
Sundar Pichai
...

In this case, we do not want to specify merge columns and adapt the prompt slightly:

result = await merge(
    session=session,
    task="Merge the CEO to the company information, use web search if unsure",
    left_table=left_table,
    right_table=right_table,
)

What is interesting to vary however, is the LLM model, as the ability to pay attention to a row in a large table ("needle in a haystack" problem) will vary, as will the confidence of the LLM and its ability to follow the user prompt.

Therefore let us compare Gemini-3-FLASH, GPT-5, and Claude-4.5-Sonnet, with minimal CoT and with more thinking allowed. Note however, that these models are only doing the semantic matching. The websearch itself is done by web agents with fixed LLM choices.

This time, the results show that the task was considerably harder for the web agents:

LLM	Matched	LLM Match	Web Match	Accuracy	False Positives	Cost
GEMINI-3-FLASH	100%	98.9%	1.1%	87.7%	12.3%	$0.75
GEMINI-3-FLASH - thinking	100%	96.8%	3.2%	98.2%	1.8%	$3.97
GPT-5-MINI	95.7%	59.8%	35.8%	96.7%	3.2%	$3.69
GPT-5.2	98.9%	79.0%	19.9%	98.2%	0.7%	$3.76
CLAUDE-4.5-SONNET	99.5%	69.4%	30.1%	89.3%	10.3%	$7.12
CLAUDE-4.5-SONNET - thinking	99.8%	74.0%	25.8%	90.4%	9.4%	$5.97

We see that agents now use the web-support in a significant number of cases, but it differs vastly by the LLM choice as well as their confidence to match rows, their accuracy and rate of false positives. The best perfoming LLMs for this case-study were GPT-5.2 and GEMINI-3-FLASH (thinking).

Link to public session (GEMINI-3-FLASH)

Link to public session (GEMINI-3-FLASH - thinking)

Link to public session (GPT-5-MINI)

Link to public session (GPT-5.2)

Link to public session (CLAUDE-4.5-SONNET)

Link to public session (CLAUDE-4.5-SONNET - thinking)

Conclusion

We showed how web-agents can complement complex merging tasks and how a cascade of merging methods is most efficient.

Note that we have neither trained any models, learned any hyperparameters, nor adapted the architecture or the prompts with respect to the data. Additionally, further engineering tricks that we have not yet exploited, like collision resolution and multi-step matching, are likely to increase the accuracy further.

You (or Claude Code) can try this easily yourself, or learn more, via the everyrow.io SDK docs