A Brief Exploration of Torrenting Copyrighted Books

What Meta’s Theft of 80 Terabytes of Data Can Tell Us About IPR

Just a few weeks ago, unsealed court documents revealed that Meta torrented ~80 terabytes of data from illegal sites like LibGen and Z-Library while training AI models (Belanger 2025). The use of seemingly democratic P2P protocols for torrenting copyrighted data, combined with unresolved questions about AI model training and the role of intellectual property rights (IPR), creates a compelling case study. Meta's alleged mass data acquisition recalls past cases of contested data access, such as Aaron Swartz’s JSTOR downloads. However, while Swartz sought to make knowledge publicly available, Meta’s actions—if proven—would serve corporate AI development, raising different questions about power, intent, and the role of intellectual property in AI governance. One Meta employee said it succinctly: “torrenting from a corporate laptop doesn’t feel right 😂” (Kadrey v. Meta 2025, Appendix A). Their intuition may be insightful - but why does this case make us feel uneasy?

A debate presented by AI is determining how to treat the use of training data in relation to copyright. If training doesn’t quite involve making copies, but rather extracting the “accumulated representation of items” (Guadamuz 2023, 5), it is not clear that there is an infringement taking place in Meta’s case. These kinds of legal questions will be determined by courts, but they aren’t what I find interesting here. Let's step back and consider the purpose of Intellectual Property Rights (IPR) for our society. Should they foster innovation or protect rights holders? Examining historical IPR frameworks might offer insight in this case.

If we take an approach in line with the “frontier” principle where “only inventions – and not discoveries” are considered ripe for patenting (Orsi & Coriat 2006, 165), we could prioritize model improvement over fair payment to rights holders whose data is used in training. Would we accept Meta’s actions if we re-situate their models as “discoveries”, and the use of unauthorized data as acceptable collateral damage? I doubt that Meta would take this angle, seeing as they hope to monetize their models, but it would shrink their compensation obligations to rights holders. Alternatively, adopting frameworks in the vein of the Bayh-Dole Act may contribute to the 'commodification of scientific knowledge', increasing monetization opportunities for rights holders but potentially slowing collaboration (Orsi & Coriat 2006, 167). In this case, whoever can pay for the most data would win, consolidating power towards larger organizations.

It is possible that neither of these frameworks, while useful elsewhere, are suited to AI models. One novel challenge with AI models is that a “trained model does not need to be updated, it can be run independently and … subsist forever” (Guadamuz 2023, 7). In other words, once it’s trained, a model doesn’t need its training data anymore. So what do we do with models that have been trained using illegally acquired data? Any new policy must be informed by the inherent politics of these “artifacts” given their capacity to replace human creators in perpetuity (Winner 1980). In our current tech ecosystem, policy which only addresses harms with one-off reparational payment to rights holders will further solidify hegemonic power for American tech companies—even large fines are hardly a deterrent to an organization like Meta. Copyright may well be a useful tool to stave off consolidation in this realm if we find it an appropriate way to recognize these politics, but it will take re-imagining the rights we grant to creators, and how we enforce them globally.

References

Belanger, Ashley. 2025. “‘Torrenting from a Corporate Laptop Doesn’t Feel Right’: Meta Emails Unsealed.” Ars Technica. February 6, 2025. https://arstechnica.com/tech-policy/2025/02/meta-torrented-over-81-7tb-of-pirated-books-to-train-ai-authors-say/.

Guadamuz, Andres. 2023. "A Scanner Darkly: Copyright Liability and Exceptions in Artificial Intelligence Inputs and Outputs." SSRN Scholarly Paper 4371204. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4371204.

Kadrey v. Meta. 2025. Plaintiffs’ Motion for Relief, No. 3:23-cv-03417-VC (N.D. Cal., February 5). https://cdn.arstechnica.net/wp-content/uploads/2025/02/Kadrey-v-Meta-Motion-for-Relief-Appendix-A-2-5-25.pdf.

Orsi, Fabienne, and Benjamin Coriat. 2006. “The New Role and Status of Intellectual Property Rights in Contemporary Capitalism.” Competition & Change 10 (2): 162–79. https://doi.org/10.1179/102452906X104222.

Winner, Langdon. 1980. “Do Artifacts Have Politics?” Daedalus 109 (1): 121–36.**

← Back to Archive