Anthropic Accused of Building Claude AI with 7 Million Pirated Books

Wait 5 sec.

:::tipANDREA BARTZ, CHARLES GRAEBER, and KIRK WALLACE JOHNSON v. ANTHROPIC PBC, retrieved on June 25, 2025, is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 6 of 10. :::(ii) The Pirated Library CopiesBefore buying books for its central library, Anthropic downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic should have paid for these pirated library copies (e.g., Tr. 24–25, 65; Opp. 7, 12–13). This order agrees. The basic problem here was well-stated by Anthropic at oral argument: “You can’t just bless yourself by saying I have a research purpose and, therefore, go and take any textbook you want. That would destroy the academic publishing market if that were the case” (Tr. 53). Of course, the person who purchases the textbook owes no further accounting for keeping the copy. But the person who copies the textbook from a pirate site has infringed already, full stop. This order further rejects Anthropic’s assumption that the use of the copies for a central library can be excused as fair use merely because some will eventually be used to train LLMs. This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no decision holding or requiring that pirating a book that could have been bought at a bookstore was reasonably necessary to writing a book review, conducting research on facts in the book, or creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded. But this order need not decide this case on that rule. Anthropic did not use these copies only for training its LLM. Indeed, it retained pirated copies even after deciding it would not use them or copies from them for training its LLMs ever again. They were acquired and retained, as a central library of all the books in the world. Building a central library of works to be available for any number of further uses was itself the use for which Anthropic acquired these copies. One further use was making further copies for training LLMs. But not every book Anthropic pirated was used to train LLMs. And, every pirated library copy was retained even if it was determined it would not be so used. Pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use — and not a transformative one (see Tr. 24–25, 35, 65; Opp. 4–10, 12 n.6; CC Br. Exh. 12 at -0144509 (“everything forever”)). Napster, 239 F.3d at 1015; BMG Music v. Gonzalez, 430 F.3d 888, 890 (7th Cir. 2005).Anthropic’s briefing contains other reasons why it believes its pirated library copies are irrelevant to our fair use analysis, notwithstanding its own statements at our oral argument. First, Anthropic accepts in this posture that it acted in bad faith but argues that its bad faith in pirating copies cannot “somehow short-circuit[ ]” the fair use analysis (Reply 6 (downplaying Atari Games Corp. v. Nintendo of Am., Inc., 975 F.2d 832, 843 (Fed. Cir. 1992) (applying law of Ninth Circuit))). But its bad faith is not the basis for this decision. Each use of a work must be analyzed objectively. Warhol, 598 U.S. at 544–45. The objective analysis here shows the initial copies were pirated to create a central, general-purpose library, as a substitute for paid copies to do the same thing. (Of course, if infringement is found, bad faith would matter for determining willfulness. 17 U.S.C. § 504(c)(2).) Second, Anthropic argues that its goal to put the copies eventually “to a highly transformative use” requires that each copy and use along the way be justified as having a transformative use, too (Reply 14). But now Anthropic seeks to take the shortcut Anthropic just said cannot be taken. Again, the Supreme Court tasks us with looking past the “subjective intent of the user” to the objective use made of each copy. Warhol, 598 U.S. at 544–45 (emphasis added). Put another way, what a copyist says or thinks or feels matters only to the extent it shows what a copyist in fact does with the work. Indeed, the same copy can be used one way, then another, each with a different result. Id. at 533. Here, what Anthropic said about its acquisitions at the time — that they were made to “build[ ] a research library” while avoiding a “huge legal/practice/business slog” — are relevant in this regard. And, Anthropic’s actual use of these pirated copies was to create its central library of texts that, like any university or corporate library, stored the works’ well-organized facts, analyses, and expressive examples for various contingent uses, one being training. (5).Third, Anthropic argues that Texaco — the case involving copies used in a central library, copies used in desk libraries, and copies used in the laboratory — is inapposite. Anthropic argues that the disputed copies in Texaco were never used in the laboratory but instead in personal desk libraries for a use “identical to the original purpose and use” of the central library copies, and so not for a transformative use (Reply 8 (summarizing 60 F.3d at 922–23)). By contrast, says Anthropic, here it did use copies in the laboratory to train LLMs — a very transformative use. But this is a fast glide over thin ice. Like Texaco, Anthropic possessed copies it did not put into use in the laboratory and it kept those copies in a central library even after its transformative use had been completed. But, unlike Texaco, which bought those copies, Anthropic never paid for the central library copies stolen off the internet. Texaco also shows why Anthropic is wrong to suppose that so long as you create an exciting end product, every “back-end step, invisible to the public,” is excused (Br. 10). Notably, this is not a case where source copies were unavailable for separate purchase or loan. See, e.g., NXIVM Corp. v. Ross Inst., 364 F.3d 471, 475–76, 478–79 (2d Cir. 2004) (using selections of training manual — otherwise available only to cult’s trainees subject to NDAs — to expose cult in critical review); Time Inc. v. Bernard Geis Assocs., 293 F. Supp. 130, 135–36, 138, 146 (S.D.N.Y. 1968) (Judge Inzer Bass Wyatt) (making charcoal drawings of photographs taken of originals otherwise not on sale or loan out to illustrate a history book). (6). Nor were the copies made only incidentally and necessarily from pirated copies. See, e.g., Perfect 10, 508 F.3d at 1164 n.8 (copies of images that had been pirated by third-party websites were used to index those same websites while indexing the entire web). Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it. Nor were the initial copies made immediately transformed into a significantly altered form. In Perfect 10, images were copied by the search engine in thumbnail form only and deployed immediately into the transformative use of identifying the full-sized images and the pages from which they came. 508 F.3d at 1160, 1165, 1167. And, in Kelly v. Arriba Software Corp., images were copied at full size and then into thumbnails for immediate use in building a search engine, after which the full-sized copies were immediately deleted. 336 F.3d 811, 815 (9th Cir. 2003). Not here. The full-text copies of books were downloaded and maintained “forever.” Nor does the initial copying here even resemble the full-text copying in the Google Books cases. There, libraries of authorized copies already had been assembled, and all copies therefrom were made for direct employment in a one-to-one further fair use — whether the transformative use of pointing to the works themselves, the use of providing the works in formats for print-disabled patrons, or the use of insuring against going out of print, getting lost, and becoming otherwise unavailable. HathiTrust, 755 F.3d at 97, 101, 103; Google, 804 F.3d at 206, 216–18, 228 (further distinguishing search and snippet uses, which “test[ed] the boundaries of fair use”). Not so here concerning the pirated copies. No authorized copies existed from which Anthropic made its first copies. No full-text copy therefrom was put immediately into use training LLMs. Not every copy was even necessary nor used for training LLMs. No initial copy was ever deleted, even if never used or no longer used. (7) The university libraries and Google went to exceedingly great lengths to ensure that all copies were secured against unauthorized uses — both through technical measures and through legal agreements among all participants. Not so here. The library copies lacked internal controls limiting access and use. Nor do the decisions on intermediate copying require anything less than the analysis applied here. Anthropic argues that our court of appeals in Sega Enterprises Ltd. v. Accolade, Inc. looked only at the “ultimate use” and “did not analyze a series of atomized acts of ‘infringement’ distinct from that overall purpose” (Reply 3). To the contrary, the appeals court examined the initial, intermediate, and ultimate copies used by the copyist. The court explained that the copyist initially purchased commercially available copies of game cartridges and then made further copies necessarily and “solely in order to discover the functional requirements for compatibility.” 977 F.2d 1510, 1522 (9th Cir. 1992). Thus, it reached only one result because on those facts there was only one “overall purpose” for the unauthorized copies. Indeed, the court reaffirmed prior caselaw holding that “intermediate copying of [a work] may infringe the exclusive rights granted to the copyright owner in [S]ection 106 of the Copyright Act regardless of whether the end product of the copying also infringes those rights.” Id. at 1518–19 (reaffirming Walker v. Univ. Books, 602 F.2d 859, 864 (9th Cir. 1979)). Similarly, in Sony Computer Entertainment, Inc. v. Connectix Corp., our appeals court applied the same law to similarly focused conduct. Another copyist allegedly had purchased an authorized copy and then made further copies solely and necessarily to reverse-engineer compatibility requirements. 203 F.3d 596, 601, 602–03 (9th Cir. 2000). Both Sega and Sony avoided imposing an “artificial hurdle” to fair use by generously construing the intermediate copying necessary to the fair use. As one example, Sega stated that an engineer should be permitted to reboot her computer while undertaking to reverseengineer software loaded onto it — even if doing so creates another digital copy of the software and is not strictly necessary to reverse-engineering. Id. at 605. But neither Sega nor Sony fathomed gifting an “artificial head start” to a fair user, either, by treating even the initial copy as an intermediate one. And, yes, some courts have “not inquire[d]” into intermediate or initial copying at all (Reply 2 (citing Campbell as not inquiring into surplus copies in the studio)). But if a “close reading of those cases [ ] reveals that in none of them was the legality of the [initial or] intermediate copying at issue,” then it was not raised and not necessarily decided. Sega, 977 F.2d at 1519; see Webster v. Fall, 266 U.S. 507, 511 (1925). It was expressly decided elsewhere: Our analysis must attend to different uses of different copies, and even to different uses of the same copies. Warhol, 598 U.S. at 533. Finally, Anthropic argues that even if the initial copies served a different use than the intermediate and ultimate copies, it was not a use for which Anthropic necessarily would have needed to pay Authors for a copy. In theory, argues Anthropic, it could have done as Google did in Google Books — find an existing reference library willing to loan its copies for free as source copies. Or, in theory, it could have done as Anthropic did later — go buy used copies without having to pay Authors at all. See 17 U.S.C. § 109(a). But Anthropic did not do those things — instead it stole the works for its central library by downloading them from pirated libraries. In sum, the first factor points against fair use for the central library copies made from pirated sources — and no damages from pirating copies could be undone by later paying for copies of the same works.\(5) Our court of appeals has not yet reappraised how bad faith (or good faith) figures in fair use after Warhol. Its prior appraisal applied the Supreme Court’s statement that “[f]air use presupposes good faith and fair dealing,” Harper & Row, 471 U.S. at 562 (cleaned up). See Perfect 10, 508 F.3d at1164 n.8. Since then, the Supreme Court has renewed its “skepticism about whether bad faith has any role.” Oracle, 593 U.S. at 32–33 (reiterating doubts of Campbell, 510 U.S. at 585 n.18). And, recently, the Supreme Court has held squarely that it is not the “subjective intent” of a copyist that counts, but the “objective . . . use” of the copy. Warhol, 598 U.S. at 544– 45. This order applies this most recent analysis. Miller v. Gammie, 335 F.3d 889, 900 (9th Cir. 2003) (en banc).(6) Anthropic repeats the misleading characterization of the copyright holder in Oracle that the initial copies were there purloined (Reply 5). Not so. “All agree[d] that Google was and remain[ed] free to use the Java language itself. All agree[d] that Google’s virtual machine [wa]s free of any copyright issues. All agree[d] that the six-thousand-plus method implementations by Google [we]re free of copyright issues. The copyright issue, rather,” was the use of Java for purposes of creating competing software having the same familiar, functional schema. Oracle Am., Inc. v. Google Inc., 872 F. Supp. 2d 974, 978 (N.D. Cal. 2012), aff’d and rev’d in part, 750 F.3d 1339 (Fed. Cir. 2014).(7) Training LLMs was not a use where perpetually maintaining a library copy was intrinsic to the proffered fair use (e.g., for a plagiarism-checker service). Nor is this an instance where retaining at least one copy was authorized by contract with the copyright owners (e.g., by agreement to express terms upon submission to a plagiarism-checker service, notwithstanding proposed terms scrawled on a paper prior to submission). A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 630, 635–36 & n.5, 645 n.8 (4th Cir. 2009), aff’g in relevant parts 544 F. Supp. 2d 473, 480 (E.D. Va. 2008) (Judge Claude Hilton). Anthropic mischaracterizes this case.:::tipContinue reading HERE. ::::::infoAbout HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.\This court case retrieved on June 25, 2025, from storage.courtlistener.com, is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.:::\