← Back to Research

Beyond Regex: Semantic Filtering for Job Search

Finding remote-friendly senior roles with disclosed salaries using AI

CompanyDescriptionAirtableAsync team, 8+ yrs, $185-220K...VercelLead our NYC team. Comp TBD...NotionIn-office SF. Staff eng, $200K...LinearBootcamp grads welcome, $85K...DescriptAnywhere, principal arch, $250K...RetoolFlexible. Building infra. DOE......994 more rowsSCREENCompanyDescriptionAirtableAsync team, 8+...DescriptAnywhere, prin......146 more148 Qualified (14.8%)

I was looking for a new role. Like many engineers, I have specific requirements:

  1. Remote-friendly: I want to work from home, or at least have hybrid options
  2. Senior-level: 5+ years experience, or titles like Senior/Staff/Principal
  3. Salary disclosed: I don't want to waste time on roles that pay below market

The challenge? Job boards are noisy. Hacker News "Who's Hiring" threads contain hundreds of postings per month, but most don't meet all three criteria. Manually scanning each one takes hours.

I used Claude Code to pull 1,000 job postings from the last 5 months of "Who's Hiring" threads via the API. Now I had a spreadsheet, but how to filter it?

The regex trap

The obvious first approach is keyword matching:

# Naive approach
is_remote = "remote" in description.lower()
is_senior = "senior" in title.lower() or "5+ years" in description
has_salary = "$" in description

This fails badly:

  • "No remote work available" contains "remote" but means the opposite
  • "Senior year intern" contains "senior" but isn't a senior role
  • "$0 in funding" contains "$" but isn't a salary
  • "Competitive salary" or "DOE" technically mentions compensation but doesn't disclose it

A regex-based approach achieves 68% precision, leaving hundreds of false positives to manually review.

So I tried something different: I ran the 1,000 postings through everyrow.io/screen with these criteria:

Qualifies if ALL THREE are met:

  1. Remote-friendly: Explicitly allows remote work, hybrid, WFH, or distributed work
  2. Senior-level: Title includes Senior/Staff/Lead/Principal/Director/Architect, OR requires 5+ years experience
  3. Salary disclosed: Specific numbers provided (e.g., "$150K", "$120-180k")

Mark False if: On-site only, junior role, "competitive" salary, or any criterion unclear

What survived

MetricValue
Total screened1,000
Qualified148
Pass rate14.8%
Processing time~20 minutes
Cost$2.90

So roughly 1 in 7 HN job postings meet all three criteria. Not bad!

Pass vs. fail

Spot-checking some of the results, it seems to be working as expected

Adobe | Senior Software Engineer | $173,500 - $331,050 | Remote-friendly

StartupCo | Staff Engineer | San Francisco (Remote OK) | "Competitive compensation"

BigCorp | Senior Developer | $180,000 | New York (Must be in office)

TechCo | Software Engineer | $90,000 - $110,000 | Remote

Is it worth it?

ApproachCostTimePrecision
Regex filteringFree20 hrs to build68%
Manual review$125 @ $25/hr5 hrs~95%
DIY LLM API~$420+ hrs dev time~80%
everyrow.io/screen$2.906 min>90%

The DIY approach is tempting, but a naive implementation actually costs more—everyrow.io uses batching and prompt optimizations that most one-off scripts skip. Add the engineering time for consistent schema, error handling, and retries, and it's not close.

The same approach works for any filtering task where keywords fail you: lead qualification, content moderation, investment screening, supplier vetting. Try it yourself, the free tier covers 50 rows.


Data: 1,000 job postings from Hacker News "Who's Hiring" threads (August 2024 - January 2025). Session URL: everyrow.io/sessions/aecfddce-da1a-43b3-ad15-a05af0a9ae72