Publishers demand pay-for-training from AI firms

auto-post.io

09-29-2025

6 min read

Summarize this article with:

ChatGPT

Perplexity

Mistral

Publishers demand pay-for-training from AI firms

Publishers and authors have moved from warning to action as generative AI systems continue to rely on large swathes of copyrighted journalism, books and other premium content for model training. What started as ad hoc scraping and uneasy public appeals has become a structured push for pay-for-training terms, collective licensing and regulatory remedies.

The debate is now driven by legal losses, high-profile settlements, and an emerging technical ecosystem designed to enforce paid access. From early licensing deals with news organizations to a $1.5 billion settlement tied to pirated books, the last two years have crystallized publisher demands into concrete commercial and policy asks.

Why publishers are demanding payment

Publishers argue that their content underpins the quality of modern large language models and that unlicensed use undermines the economic model that supports journalism and books. Executives and trade groups have warned that unchecked scraping for training data diverts traffic, erodes subscription and advertising revenues, and breaks the 'symbiosis' once relied on between publishers and search or aggregation platforms.

That commercial argument has been paired with a rights-based claim: authors and newsrooms say they must control licensing of their work and be compensated when it is used to train commercial AI systems. Groups such as the Authors Guild and the News/Media Alliance (NMA) have made transparency and collective bargaining central to their demands.

Those demands coalesce into three clear asks from publishers: mandatory transparency about which copyrighted works feed model training, support for collective licensing frameworks that can set standard terms, and statutory or regulatory measures to rebalance negotiations with deep-pocketed AI firms.

Landmark cases and the Anthropic settlement

Legal action has been a major accelerant. Courts have issued mixed signals on whether using copyrighted material for training constitutes fair use, but some rulings and factual findings have favored publishers' claims or revealed risky practices by AI developers.

A pivotal development was the Anthropic class-action resolution that was preliminarily approved on Sept. 25, 2025, under which Anthropic agreed to pay roughly $1.5 billion to authors and publishers. Reporting around that approval estimated the payout would average about $3,000 per included book and that roughly 465,000 titles were covered for payout calculations.

The settlement followed a June 23, 2025 summary-judgment finding in which a U.S. judge concluded Anthropic's use might in some respects qualify as fair use, but also found the company had stored millions of pirated library copies in a central repository , a factual vulnerability that materially influenced the push to settle. Judge Alsup's record noted more than seven million pirated copies were at issue in the litigation, underscoring the scale problem that publishers highlight.

Early licensing deals and market terms

Not all publishers chose litigation. A patchwork of deals has emerged as an alternative route: the Associated Press licensed its news archive to OpenAI in January 2023, while Germany's Axel Springer signed a global partnership with OpenAI on Dec. 13, 2023 that included payments, attribution and permitted use of selected paywalled content.

Trade publishers have experimented with per-title, one-off fees and author opt-ins. HarperCollins, reported in late 2024 to be the first of the Big Five to strike an AI training license for select nonfiction backlist, offered participating authors an opt-in and one-time payments in the neighborhood of $2,500 per chosen book , a market signal that many plaintiffs later reflected in their damages demands.

Market trackers and reporting note wide variance in deal terms and many undisclosed financials, but a recurring structure emerged: per-title payments (often presented as one-time fees), author controls where possible, and negotiated uses that ranged from direct ingestion to summarized or attributed access.

Evidence, leverage and publisher claims about dataset composition

Publishers have leaned on independent studies and internal analyses to argue that curated LLM datasets overweight premium publisher content. An NMA white paper from October 2023 and related technical analysis asserted that publishers' material was overrepresented by a factor of 'over 5 to almost 100' compared with generic web crawls like Common Crawl.

That research fed the narrative that publishers possess leverage: if models disproportionately rely on their reporting and books, publishers can credibly demand payment or collective licensing to capture some of the downstream value created. NMA CEO Danielle Coffey summarized the stance bluntly, saying the analysis shows AI developers 'are using it pervasively'.

Publishers also used litigation filings to quantify their expectations. Several lawsuits and complaints have asked for minimum damages per work , filings have cited figures like $2,500 per infringement , signaling how plaintiffs calibrate monetary expectations for 'pay-for-training' approaches.

Collective licensing and technical enforcement: the RSL initiative

To move beyond ad hoc deals and litigation, publishers and platform partners launched technical and collective solutions. A prominent example is the Really Simple Licensing (RSL) protocol and the RSL Collective, announced on Sept. 10, 2025, as a machine-readable licensing/robots.txt standard to let rights-holders set pay-per-crawl, pay-per-inference or subscription terms for automated crawlers.

RSL's backers reportedly included major web platforms such as Reddit, Yahoo, Medium and Quora, and the initiative proposed cooperation with CDNs like Fastly to enforce pay-to-crawl access. The idea is to harden publishers' bargaining position by making it technically straightforward to demand payment at scale, rather than forcing each publisher into one-off negotiations or endless litigation.

Combined with collective-rights organizations, RSL represents a non-litigation path to extract payment and standardize terms. It also creates a commercial calculus for AI firms: either negotiate publisher licenses, build training pipelines that avoid protected content, or accept the costs and uncertainty of protracted legal battles.

What this means for AI firms and the wider market

The cumulative effect of settlements, lawsuits, deals and new technical standards is to raise the commercial and compliance bar for AI developers. The Anthropic resolution in particular , and the factual finding about large-scale storage of pirated books , has signaled that risks can turn into very large liabilities.

AI firms now face three broad options. They can scale negotiated licenses and transparency measures, as exemplar deals with AP and Axel Springer have shown. They can redesign training pipelines to rely less on premium copyrighted content or to use properly licensed or synthetic alternatives. Or they can continue to contest legal claims, accepting the mixed posture of the courts and the potential for further settlements and damages.

For publishers and authors, the combination of legal pressure and technical standards like RSL creates more leverage than a scattershot litigation strategy alone. For policymakers, this moment raises questions about disclosure rules, collective licensing frameworks and whether new statutory remedies are needed to balance innovation with creators' rights.

As the market adjusts, expect more standardized licensing terms, greater transparency around training corpora, and continued experimentation with per-title fees, author opt-ins and machine-readable access controls. The next 12, 24 months will likely determine whether pay-for-training becomes an industry norm or remains a patchwork of agreements and court outcomes.

Publishers' demand for pay-for-training has shifted from rhetorical to institutional: litigation established liability risk, early deals illustrated commercial pathways, and new collective tools promise scale. Whether AI firms and publishers converge on a sustainable equilibrium depends on law, technology and the commercial appetite for negotiated licenses.

Ready to get started?

Start automating your content today

Join content creators who trust our AI to generate quality blog posts and automate their publishing workflow.

Get started free View pricing

No credit card required

Cancel anytime

Instant access

Publishers demand pay-for-training from AI firms

Why publishers are demanding payment

Landmark cases and the Anthropic settlement

Early licensing deals and market terms

Evidence, leverage and publisher claims about dataset composition

Collective licensing and technical enforcement: the RSL initiative

What this means for AI firms and the wider market

Start automating your content today

Recommended articles

Label AI content before publication

Adapt SEO for AI overviews

EU code forces AI content generators to watermark output

Publishers demand pay-for-training from AI firms

Why publishers are demanding payment

Landmark cases and the Anthropic settlement

Early licensing deals and market terms

Evidence, leverage and publisher claims about dataset composition

Collective licensing and technical enforcement: the RSL initiative

What this means for AI firms and the wider market

Start automating your content today

Recommended articles

Label AI content before publication

Adapt SEO for AI overviews

EU code forces AI content generators to watermark output

Before you go...

Cookie Management

Cookie Management

Cookie Details

Essential Cookies

Analytics Cookies

Marketing Cookies