Blog

The litellm 1.82.7 and 1.82.8 supply chain attack on PyPI hit 47,000 downloads in 46 minutes. We analyzed all 2,337 dependent packages - 88% had version specs that allowed the compromised versions.

litellm 1.82.8 Supply Chain Attack on PyPI (March 2026)

March 24, 2026·

Callum McMahon

litellm version 1.82.8 on PyPI contains a malicious .pth file that harvests SSH keys, cloud credentials, and secrets on every Python startup, then attempts lateral movement across Kubernetes clusters. First reported to PyPI by FutureSearch, whose report led to the package being quarantined.

Compromised just by starting an MCP Server in Cursor

March 24, 2026·

Callum McMahon

A malicious litellm release on PyPI compromised our machine through an MCP server's unpinned dependency. No prompt injection, no LLM trickery, just a poisoned package auto-downloaded by uvx.

JavaScript Thinks Everything's a Date

March 18, 2026·

JavaScript's Date.parse() will turn almost any string into a date. We tested V8, SpiderMonkey, and JavaScriptCore with surprising inputs and documented every quirk, from ISO 8601 UTC gotchas to why date-fns parseISO rejects what new Date() accepts.

How AI Agents Optimize SEO Using Google Search Console Data

March 18, 2026·

We run an SEO pipeline that reads Google Search Console data, spawns an Opus agent per page, and proposes title and description changes. Each agent reads the history of every experiment we've run on that page. Over time, the suggestions get better.

How We Built a Marketing Pipeline with Claude Code

March 11, 2026·

We built a pipeline that scans 18 community sources every morning, classifies opportunities with a 13-question rubric, and drafts responses. 2-3% signal rate. The hard part isn't running it - it's knowing what to look for.

Are Your MCP Servers Leaking Docker Containers?

March 6, 2026·

Docker-based MCP servers leave behind zombie containers because the Docker daemon keeps them alive after Claude Code exits. Switching from docker run to uvx eliminates the problem entirely.

How to Debug AI Agents by Analyzing Their Own Traces with LLMs

March 3, 2026·

Peter Mühlbacher

We built a Claude Code skill that reviews our AI agent traces and catches issues we'd miss ourselves. Here's how it works, and why it only became possible now.

Caution: Read the Docs for Claude 4.6's Effort Parameter

March 2, 2026·

Peter Mühlbacher

Here's what Claude's effort parameter actually controls. For Opus and Sonnet 4.6, high effort primarily increases reasoning depth, but also...

How to Run Claude Code as a Kubernetes CronJob

We run Claude Code in Kubernetes for long-running marketing CronJobs. This originally sounded like a terrible idea, but after running it for a few months, we think it's a genuinely valid engineering approach - for the right kind of work.

Using Claude Code as a Workflow Engine

How we replaced Airflow and CI pipelines with Claude Code skills and subagents. Markdown files define multi-step workflows, agents execute each phase, and outputs land as plain files in GitHub.

Unleashing AI forecasters on Kalshi prediction markets

A case study using FutureSearch to run hundreds of parallel research agents and generate forecasts with full rationales across 100 Kalshi prediction markets.

Can AI Beat Kalshi? Simulating a Prediction Market Portfolio

We take our AI forecaster's probability estimates, compare them to live Kalshi order books, and build a simulated portfolio to benchmark whether the forecasts are actually accurate.

MCP structuredContent: How to Return Large Results Without Flooding the Context Window

Rafael Poyiadzi

Instead of dumping thousands of rows into the MCP tool response, split the audience: content for the model (text summary), structuredContent for the user (interactive widget at zero token cost), and a download URL for the sandbox.

OpenAI is a textbook example of Conway's Law

OpenAI's Responses and Chat Completions APIs have inexplicable inconsistencies. A real-world example of Conway's Law, where org structure dictates software design.

How to Upload Large Files to an MCP Server Without Filling the Context Window

February 25, 2026·

Rafael Poyiadzi

Inlining data in MCP tool calls eats the LLM's context window. We show how to use presigned URLs so Claude can upload files directly to your server, keeping the context clean with a 36-character artifact ID.

LLM API Differences That Break Your Code: Anthropic vs OpenAI vs Google

February 24, 2026·

LLM APIs look interchangeable on paper. In practice, they diverge in subtle ways that break your code. We document the provider-specific quirks we've hit while running thousands of LLM calls per day across Anthropic, Google, and OpenAI.

Ask LLM Agents to Classify Problems Before Starting

February 17, 2026·

Christoph Sträter

Before merging datasets, LLM agents should classify whether the join is one-to-one, one-to-many, or many-to-many. Getting cardinality wrong leads to duplicated rows, missing matches, and broken pipelines. Here's how to classify merge problems automatically.

How Much Does Deep Research Cost? A Model-by-Model Breakdown

February 12, 2026·

Peter Mühlbacher

We benchmarked the cost and speed of deep research across ChatGPT, Gemini, Perplexity, and Grok. See which model gives the best answers per dollar on Deep Research Bench.

Using LLMs for Data Cleaning At Scale

February 6, 2026·

Rafael Poyiadzi

Learn how to deduplicate tens of thousands of rows using LLMs at minimum cost and high accuracy.

How AI Finds Fuzzy Duplicates in Large Datasets

January 19, 2026·

Nikos Bosse

Semantic deduplication uses AI to catch duplicates that exact matching misses. Learn how fuzzy matching detects entries like "IBM" and "International Business Machines" as the same entity across thousands of rows.

How LLM Agents Solve the Table Merging Problem

January 16, 2026·

Christoph Sträter

Learn how to merge tables without a common key using AI. This tutorial walks through fuzzy matching, entity resolution, and joining datasets where VLOOKUP and exact-match joins fail.

How to Test if Founder-Led Companies Outperform: A Qualitative Ranking Method

January 15, 2026·

Learn how to evaluate companies by criteria like founder alignment, moat strength, and capital allocation. Score and compare stocks by any criteria to evaluate and test a custom investment thesis

How to Rank S&P 500 Companies by Risk of Management Turnover

January 14, 2026·

Which companies have had the most C-suite churn over the last 10 years? I researched all S&P 500 companies to find out.

Top Frontier AI Companies 2026: Rankings & Predictions

January 8, 2026·

Who leads the AI race in 2026? We rank OpenAI, Anthropic, Google DeepMind, Meta AI, and xAI across model quality, data, compute, talent, and R&D. Predictions of Anthropic's rise ahead of their March 2026 surge in revenue

Calculate Intrinsic Value for Any Stock with DCF Analysis

November 11, 2025·

We calculate intrinsic value the way Finance 101 teaches: forecasting revenue, margins, and shareholder payouts over the life of every company. By projecting actual cash flows 10+ years out with probabilistic forecasts, we can sort all stocks by discount to fair value without anchoring on market prices.

AI 2027 Scenario: Revisiting AGI Forecasts in 2026

October 19, 2025·

Six months after the AI 2027 report predicted a fast AGI timeline, we revisit the forecasts alongside Karpathy's critiques. How have the original predictions held up, and why are timelines shifting?

Stockfisher: DCF Stock Screener with 10-Year Cash Flow Forecasts

October 14, 2025·

Stockfisher enables value investors to screen the entire market for the highest long-term returns based on detailed 10-year cash flow forecasts. For the first time, compare 3,000+ companies apples-to-apples with rigorous fundamental analysis at quantitative scale.

A Guide for LLM Assisted Web Research

June 26, 2025·

Practical guide to building LLM-powered web research agents — covering search strategies, source evaluation, and synthesis.

Superhuman Coders in AI 2027 - Not So Fast

May 1, 2025·

A critical look at the AI 2027 report's claims about superhuman coding — why the timeline is likely too aggressive.

How Tariffs Will Increase Prices on American-Made Products: Cost Impact Analysis

April 18, 2025·

Even American-made products rely on imported components, steel, and aluminum. Our analysis breaks down how tariffs increase the real cost of a Ford F-150 by $2,600-$3,800 and a Tesla Model 3 by $1,900-$2,400, with data on every major input.

Apple's Plan to Power Siri with ChatGPT was a Predictable Failure

March 10, 2025·

Lawrence Phillips

OpenAI declined Apple's offer to power Siri with ChatGPT. Learn why the partnership failed and what it means for Apple Intelligence and AI integration in iOS.

How Deep Research Agents Fail: Lessons from OpenAI, Gemini, and Perplexity

February 28, 2025·

Developers of agents must reckon with two types of failures: giving up too early (lack of persistence) and repeating failed approaches (lack of adaptation). Analysis of OpenAI and Perplexity's Deep Research products helps those building or working with agents understand how to balance these tradeoffs.

OpenAI Deep Research: Honest Analysis and Real Limitations in 2025

February 19, 2025·

OpenAI's Deep Research tool was initially impressive when released in Feb 2025, but it actually underperformed the later release of ChatGPT-o3+search. Careful analysis of 6 strange failures show the subtle unreliability of "Deep Research" style products.

The Death and Life of Prediction Markets at Google

November 11, 2024·

The inside story of how prediction markets were built, killed, and revived at Google — published in Asterisk Magazine.

How to Integrate AI Into Forecasting

June 9, 2024·

Lawrence Phillips

Video presentation on integrating AI into forecasting workflows — covering practical approaches and lessons learned.

The Human v Bots Forecasting Tournament

January 8, 2024·