Millions of copyrighted songs — including chart-topping hits from Taylor Swift, Bad Bunny, and virtually every major artist in modern popular music — were scraped and used to train AI music generators, and The Atlantic just made it searchable. Staff writer Alex Reisner published four catalogs documenting exactly which tracks fed these models: roughly 12 million tracks in the largest database, about 9 million in the second, and two smaller sets totaling around 100,000 each. This isn't a smoking gun — it's a full forensic audit of an industry built on stolen art.

The AI Companies Knew

Suno, one of the most prominent AI music generators, already acknowledged in court filings that it trained on 'tens of millions' of recordings and later admitted unlicensed copyrighted material was included, according to Heavy Lifting. These weren't vague admissions either — Suno essentially confessed to doing exactly what labels have alleged: swallowing entire catalogs without asking permission or cutting licensing deals first. The company's defense? Fair use. They argue models learn abstract patterns rather than memorizing specific songs. Labels call it piracy with a pitch deck.

The Legal War Escalates

Sony, Universal Music Group, and Warner Music Group filed lawsuits against Suno and Udio seeking up to $150,000 per song in statutory damages — a figure that could reach astronomical totals given the scale of training data involved. A parallel book-industry case framed mass scraping as piracy rather than simple copyright infringement and reached an initial $1.5 billion settlement figure, setting precedent that rights holders are taking seriously. Meanwhile, the U.S. Copyright Office stated in January 2025 that AI-generated music often cannot be copyrighted without sufficient human authorship — meaning these tools can potentially infringe existing works while producing outputs that carry zero protection of their own.

Technical Defenses Emerge

Researchers at the University of Tennessee developed HarmonyCloak, a tool that adds inaudible audio perturbations to recordings, making songs effectively unlearnable by AI models while sounding identical to human ears. This represents one of the few artist-controlled technical options in a landscape where most protections remain theoretical and reactive rather than preventive. It's not a silver bullet, but it's a glimpse at what defensive tooling could look like when creators have agency over their own data.

The Industry Shifts

The scrape-everything era may be ending as labels, lawmakers, and researchers build workable alternatives. Warner Music Group and Universal Music Group have reportedly struck deals with Udio and Suno respectively, moving toward licensed AI music models that actually compensate rightsholders. Tennessee passed a law protecting musical artists' voices from unauthorized AI cloning — the kind of legislative protection that's been desperately needed. Streaming platforms are deploying AI-detection tools to flag and limit generative imitations, though results have been mixed with AI-generated copycats continuing to slip through and monetize.

Key Takeaways

  • The Atlantic's databases contain roughly 21 million tracks used to train Suno, Udio, and similar services without licensing agreements
  • Major labels are seeking $150,000 per song in statutory damages from AI music companies in ongoing litigation
  • Suno admitted in court filings that it trained on unlicensed copyrighted material despite claiming fair use protections
  • Licensing deals between major labels and AI companies suggest the industry may shift toward compensated training practices

The Bottom Line

The Napster debate is back — this time with better lawyers, searchable databases, and a pitch deck. These catalogs aren't just journalism; they're evidence that could reshape how AI companies approach training data forever. If you're building anything in the generative space, the lesson here is simple: eventually someone counts what you took.