AI Sentiment Index

Methodology — how raw headlines become a daily index: scoring, model choice, pipeline, and limitations.

Every headline is scored for its stance toward AI—not the sentiment of its words. This page explains how, and where it falls short.

01 · Scoring

How scoring works

Claude Haiku (claude-haiku-4-5-20251001) reads each headline and scores its stance toward AI on a scale from −1.0 (anti-AI) to +1.0 (pro-AI). It scores the title and summary together — summaries carry context that titles drop — and returns a single number. A day’s index value is the mean across that day’s headlines; scores within ±0.05 of zero count as neutral in the daily tallies.

Stance is not word valence, and the distinction does the real work:

Scores positive

“Anthropic Wins Court Order Pausing Ban”

A win for an AI company — even though “ban” and “court” sound negative.

Scores negative

“AI Replaces 500 Jobs”

Negative for AI sentiment, even though it demonstrates AI capability.

Funding rounds, launches, and breakthroughs generally score positive; bans, lawsuits, safety failures, and job losses generally score negative; neutral reporting lands near zero.

The daily mean is headline-weighted: every headline counts once, so an outlet that publishes more moves that day’s index more. That is deliberate — the index measures the tone of AI coverage as it actually lands, volume included — but it means a shift can come from who published, not only from what changed. The per-source means on the leaderboard are the volume-neutral view.

02 · Model choice

Why an LLM instead of a lexicon

The index did not start with Claude. The first scorer was VADER, a lexicon-based sentiment analyzer, patched with hand-built dictionaries of domain adjustments — boosts for words like “breakthrough” and “funding”, penalties for “lawsuit” and “existential risk”.

It kept failing in ways no dictionary could patch. Tested against Claude on the same headlines, VADER agreed on direction only 62% of the time. Three failure modes accounted for most of the gap:

Substring matching. “ban” matched “bank” and “banking”, so headlines about banks were docked for prohibitions that were never there.
Context blindness. A lexicon sees “wins court order pausing ban” as a pile of negative words. It cannot model who won, or what winning means for AI.
News and legal language. Coverage of regulation and litigation uses charged vocabulary in neutral or even positive constructions — and a lexicon scores the vocabulary.

So Claude Haiku became the primary scorer. At the current volume of 100-odd headlines a day, that costs a few cents per day. VADER never left, though: its compound score is still computed for every headline and stored as score_raw, and it remains the fallback — now with word-boundary regexes for terms like “ban” — whenever the API key is missing or a request fails. Every row records which scorer produced it in a scored_by field, so scorer changes can be audited later.

03 · Pipeline

From feed to chart

A GitHub Actions cron runs every six hours. It pulls 14RSS feeds, keeps headlines that match an AI keyword filter, dedupes against everything already stored, and scores only what is new. Headlines and daily aggregates — mean, counts, per-source breakdowns — are upserted into two Supabase tables, and the Next.js frontend re-renders on the same six-hour cadence via incremental static regeneration.

RSS feeds
14 outlets
GitHub Actions
cron · every 6h
Claude scoring
title + summary
Supabase
headlines · daily_scores
Next.js ISR
revalidates every 6h

No queue, no inference servers, nothing to babysit: a cron job, one Python script, a Postgres database, and a static site.

04 · Sources

What gets read

14 outlets, spanning general tech press, business desks, and AI-specific verticals. The list lives in a single sources.jsonconsumed by both the Python ingester and this site — including the grid below.

Ars Technica
BBC Technology
Bloomberg
CNBC Tech
Fox News Tech
MIT Tech Review
NPR Technology
NYT Technology
TechCrunch
The Guardian
The Verge
VentureBeat AI
Wired
ZDNet AI

Feeds are fetched in full on every run; a headline enters the index only if it matches the AI keyword filter.

05 · Data

Take the data

The full dataset — every scored headline and every daily aggregate since January 2025 — is exported weekly to the repository as JSON and CSV, free to use with attribution: data/export on GitHub. daily_scores.json is the index itself; headlines.csv has every row behind it.

06 · Limitations

Honest limitations

RSS is a moving window
Feeds retain roughly a week of history, and there is no backfill source. If ingestion stalls for longer than that, the missed headlines are permanently lost.
One model’s judgment
Sentiment is subjective, and every score here is a single model reading a single prompt. The aggregate trend is more trustworthy than any individual score.
The fallback is weaker
When scoring falls back to the lexicon, word-boundary regexes prevent the worst substring errors, but it still cannot read context — the exact failure that motivated the switch.
It costs money
A few cents a day in API calls at current volume. If the key is absent or the API errors, the index degrades to lexicon scoring rather than stopping.
Volume moves the needle
The daily mean weighs every headline equally, so prolific outlets and AI-vertical feeds pull harder than occasional publishers — and any change to the source list shows up in the index itself. Long-range comparisons read best alongside the per-source view.

← Back to the dashboard

AI Sentiment Index

How positive or negative are major news outlets when they write about AI? A daily score from −1.0 to +1.0 across 14 sources.

AI Sentiment Index

Methodology — how raw headlines become a daily index: scoring, model choice, pipeline, and limitations.

Every headline is scored for its stance toward AI—not the sentiment of its words. This page explains how, and where it falls short.

01 · Scoring

How scoring works

Stance is not word valence, and the distinction does the real work:

Scores positive

“Anthropic Wins Court Order Pausing Ban”

A win for an AI company — even though “ban” and “court” sound negative.

Scores negative

“AI Replaces 500 Jobs”

Negative for AI sentiment, even though it demonstrates AI capability.

Funding rounds, launches, and breakthroughs generally score positive; bans, lawsuits, safety failures, and job losses generally score negative; neutral reporting lands near zero.

02 · Model choice

Why an LLM instead of a lexicon

It kept failing in ways no dictionary could patch. Tested against Claude on the same headlines, VADER agreed on direction only 62% of the time. Three failure modes accounted for most of the gap:

Substring matching. “ban” matched “bank” and “banking”, so headlines about banks were docked for prohibitions that were never there.
Context blindness. A lexicon sees “wins court order pausing ban” as a pile of negative words. It cannot model who won, or what winning means for AI.
News and legal language. Coverage of regulation and litigation uses charged vocabulary in neutral or even positive constructions — and a lexicon scores the vocabulary.

03 · Pipeline

From feed to chart

RSS feeds
14 outlets
GitHub Actions
cron · every 6h
Claude scoring
title + summary
Supabase
headlines · daily_scores
Next.js ISR
revalidates every 6h

No queue, no inference servers, nothing to babysit: a cron job, one Python script, a Postgres database, and a static site.

04 · Sources

What gets read

Ars Technica
BBC Technology
Bloomberg
CNBC Tech
Fox News Tech
MIT Tech Review
NPR Technology
NYT Technology
TechCrunch
The Guardian
The Verge
VentureBeat AI
Wired
ZDNet AI

Feeds are fetched in full on every run; a headline enters the index only if it matches the AI keyword filter.

05 · Data

Take the data

06 · Limitations

Honest limitations

RSS is a moving window
Feeds retain roughly a week of history, and there is no backfill source. If ingestion stalls for longer than that, the missed headlines are permanently lost.
One model’s judgment
Sentiment is subjective, and every score here is a single model reading a single prompt. The aggregate trend is more trustworthy than any individual score.
The fallback is weaker
When scoring falls back to the lexicon, word-boundary regexes prevent the worst substring errors, but it still cannot read context — the exact failure that motivated the switch.
It costs money
A few cents a day in API calls at current volume. If the key is absent or the API errors, the index degrades to lexicon scoring rather than stopping.
Volume moves the needle
The daily mean weighs every headline equally, so prolific outlets and AI-vertical feeds pull harder than occasional publishers — and any change to the source list shows up in the index itself. Long-range comparisons read best alongside the per-source view.

← Back to the dashboard