The FiveThirtyEight Index and the Recovery of Data Journalism Archives
The FiveThirtyEight Index uses a Python-based crawler to surface 21,350 unique URLs from the Wayback Machine's CDX API, bypassing the site-wide redirect implemented by ABC News in early 2025 (Source:

The FiveThirtyEight Index uses a Python-based crawler to surface 21,350 unique URLs from the Wayback Machine's CDX API, bypassing the site-wide redirect implemented by ABC News in early 2025 (Source: GitHub, Editor & Publisher).
The Pitch
Ben Welsh, a News Applications Editor at Reuters, has built a searchable directory for the 16-year history of FiveThirtyEight after Disney effectively erased the publication's legacy. It provides a clean Svelte-based interface for content that corporate owners attempted to bury behind an ABC News Politics redirect. Data journalists and backend engineers are currently using it to recover historical datasets that were previously considered lost to the "link rot" of 2025.
Under the Hood
The technical architecture is straightforward: a Python crawler interacts with the Internet Archive’s CDX API to map the publication's history from 2008 to 2024 (Source: GitHub). This index acts as a specialized pointer, mapping 21,350 unique articles to their most stable archived snapshots (Source: Reddit r/fivethirtyeight). By using a Svelte frontend, Welsh avoids the overhead of the standard Wayback Machine UI, making the archive searchable for the first time since the original site’s demise.
However, the tool is a portal, not a mirror. Complex JavaScript-heavy interactives—such as the famous 2016 election models or the "P-hacking" interactive—remain partially or fully broken (Source: HN). These legacy assets frequently fail because they call backend scripts or data JSONs that weren't captured during the original crawls.
There are significant gaps in the current documentation. We don't know yet if the index fully covers the "projects.fivethirtyeight.com" subdomains, which housed the most computationally heavy election models (Source: UsedBy Dossier). Furthermore, the legal status of this "content rehydration" remains unconfirmed, as it is unclear if Disney has issued updated robots.txt instructions or legal challenges to the Internet Archive regarding these specific assets.
The tool’s survival is entirely dependent on the Internet Archive’s infrastructure. If the Wayback Machine faces further legal pressure or technical outages, this index becomes a directory of dead links. It is a fragile layer of discovery built on top of a volatile storage medium.
Marcus's Take
The FiveThirtyEight Index is a necessary piece of digital archaeology, but it highlights the precarious nature of modern web engineering. As a backend analyst, I find the dependency on the Wayback Machine’s CDX API both elegant and dangerously thin. Use this for retrieving text-based historical data or verifying past reporting, but do not expect it to serve as a reliable environment for running legacy data visualizations. It is a library catalog, not the library itself.
Ship clean code,
Marcus.

Marcus Webb - Senior Backend Analyst at UsedBy.ai
Related Articles

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript
Audiomass is a browser-based, multitrack audio editor that operates entirely client-side with a remarkably small 100kb footprint (audiomass.co). It provides a workflow reminiscent of classic editors l

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era
The document, signed May 15 and officially released today, was presented at the Vatican alongside Christopher Olah, co-founder of Anthropic and lead of its interpretability team (ncronline.org, Forbes

The Zero-Click Economy: Kagi Search vs. Google AI Mode
Google has effectively pivoted to an "answer engine" where Gemini 3.5 Flash provides conversational summaries, while Kagi remains the primary refuge for users seeking a human-centric, ad-free index. W
Stay Ahead of AI Adoption Trends
Get our latest reports and insights delivered to your inbox. No spam, just data.