A federal judge has signaled that Meta’s torrenting of books used to train AI models may matter for copyright assessments, even as the company has largely prevailed on most claims in a high-profile lawsuit brought by a group of authors. The ruling leaves open the possibility that Meta’s use of shadow libraries and peer-to-peer sharing could influence fair-use analysis, bad-faith considerations, and the relationship between downloading and training a large language model. The court scheduled a forthcoming discussion to address a separate, still-unproven question: whether Meta unlawfully distributed the authors’ protected works during the torrenting process. While discovery remains incomplete, the court warned that the torrenting activity—carried out via BitTorrent and sourced from a shadow library known as LibGen—could be relevant in several nuanced ways that may shape the case’s trajectory and potential licensing developments in the AI training ecosystem.
Background and procedural posture of the case
The litigation centers on claims that Meta Platforms, Inc. used works without permission to train its Llama family of large language models, potentially infringing the rights of authors whose books were copied and assembled into datasets for model training. The plaintiff group comprises thirteen authors, including prominent figures such as a well-known comedian and a Pulitzer-winning author. The central legal question has been whether Meta’s training practices constituted fair use or, alternatively, actionable infringement.
Across the prior stages of litigation, Meta has achieved substantial success on the core infringement claims, with a significant portion of the case resolved in its favor through a summary-judgment ruling. This left a narrower slate of questions tied to Meta’s handling of the torrenting episode and the broader implications for legitimate AI development and licensing strategies. The court’s recent order confirms that discovery has not yet fully illuminated how Meta’s torrenting activities played into distribution and whether these activities had distinct material effects on the overall infringement analysis. The court has indicated that the parties will meet to discuss the plaintiffs’ separate claim that Meta unlawfully distributed the protected works during the torrenting process, revealing the court’s intent to dissect allocation of responsibility for distribution activities in addition to the initial copying.
In this case, the torrenting episode is tied to LibGen, a shadow library that has been the source of a large volume of downloaded content. The court acknowledged that the torrenting could have been extensive—potentially surpassing 80.6 terabytes of data in some configurations—and that these actions occurred after Meta had moved away from pursuing conventional licensing deals for the books in question. Meta’s choice to procure copies from pirate sources after engaging publishers and attempting to negotiate licenses has created a complex factual tapestry that the court must weigh when assessing the character and consequences of Meta’s use, particularly in the transformative context of training an AI model.
As the case advances, the judge underscored the importance of a complete evidentiary record on the distribution matter. The discovery process has lagged in this aspect since it was raised relatively late in the litigation, and that lag has constrained the ability to draw definitive conclusions about the scope and mechanics of Meta’s distribution of copyrighted works. The court’s stance is not a rejection of the authors’ allegations on distribution; rather, it recognizes the need for a fuller evidentiary record to determine how distribution might interact with fair-use analysis and with the overall assessment of Meta’s conduct.
The court’s order and its potential implications
In a formal ruling that partially grants Meta’s motion for summary judgment, the court laid out several avenues by which the torrenting activity could be relevant to the case, even while the broader copyright-infringement claims have been resolved in Meta’s favor. The judge made clear that the torrenting episode is not categorically irrelevant to the inquiry into whether Meta’s copying was fair use. Instead, the court suggested that the act of downloading from shadow libraries could influence the fair-use calculus in nuanced ways and, possibly, the assessment of Meta’s state of mind at the time of the copying.
One of the most salient points the court raised concerns “bad faith” as a potential factor in the fair-use analysis. The judge noted that the law is in flux about whether bad-faith conduct remains a relevant consideration for fair use. This is important because fair use weighs four factors: the purpose and character of the use (including whether it is transformative), the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use on the market for the original work. The court observed that Meta’s decision to pirate books from shadow libraries—after attempting to license the works and then escalating to piracy—could plausibly affect the first factor by signaling a character of the use and potentially informing questions about intent and market harm.
The opinion also examined whether Meta’s torrenting might color the second and third factors—the nature of the works and the amount used—by highlighting characteristics that make the use more or less protectable. For the first factor, the court noted the possible relevance of “bad faith” as a qualitative measure of how the copying occurred, but it cautioned that the legal regime governing bad faith’s role in fair use remains unsettled. There is a sense in which torrenting from shadow libraries could cast Meta’s conduct in a less favorable light, depending on whether such actions were deemed to reflect a disregard for licensing norms and fair dealing principles.
A separate line of analysis in the court’s discussion focused on whether Meta’s torrenting is interconnected with the transformative purpose of its use. The judge stated that there is a central, inseparable link between Meta’s downloading of the books and Meta’s ultimate use of the works to train its Llama models. The court concluded that because the end-use—transformative AI training—was the primary objective, the downloading activity could be deemed part and parcel of the same transformative process. This reasoning aligns with a common interpretation in fair-use jurisprudence that transformative uses can tip the balance toward fair use, provided other factors are not overwhelmingly adverse.
At the same time, the court flagged a potential legal risk for Meta: the possibility that, in some circumstances, torrenting might be viewed as a practice that benefits pirate libraries and thus undercuts the moral and legal justifications for using such materials. The court warned that this consideration might weigh against the use if it were proven that Meta’s actions contributed to a system that perpetuated unauthorized copying and distribution. However, the judge also observed that, thus far, the record doesn’t provide conclusive evidence that Meta’s downloading steps directly sustained or financially benefited the shady networks from which the books were sourced.
In weighing the distribution issue, the court emphasized that the vast majority of peer-to-peer file-sharing cases have resulted in copyright infringement findings. This contextual factor could loom over the case as a baseline expectation, even as the court recognizes the narrow, distribution-specific question presented. Notably, the court pointed to the possibility that some of the libraries Meta used have themselves been found liable for infringement, which could reinforce concerns about distribution and the broader ecosystem in which AI data is assembled. Yet, the judge acknowledged that the authors have not yet presented specific evidence demonstrating that Meta’s downloading directly propped up or financially supported these pirate libraries in a way that would alter the fair-use calculus.
The court also addressed the relationship between download activity and the training process more directly. The authors argued for a conceptual separation between the act of downloading and the internal use of the downloaded material in training. The judge resisted a strict separation, affirming that the ultimate, transformative use of the books—in the form of training Llama—cannot be detached from the underlying act of obtaining those books. This reasoning reinforces the idea that the method of data acquisition can be part of understanding the nature of the use, particularly in the modern AI-age context where the data pipeline and the training objective are deeply interwoven.
Finally, the judge noted a practical issue: the record on Meta’s alleged distribution remains incomplete because the torrenting activity was revealed by discovery only after significant portions of the litigation had unfolded. The court left room for further evidence that could alter conclusions about distribution. The judges suggested a potential path forward in which the plaintiffs might present evidence that Meta contributed significant computing power to the BitTorrent network or otherwise facilitated shadow libraries in ways that could be material to the overall analysis.
Fair use, bad faith, and the legal flux
A core element of the court’s discussion centers on the fair-use doctrine and the evolving role of bad faith in its application. In traditional fair-use analysis, the court weighs four factors to determine whether a use of a copyrighted work is permissible without permission. The first factor—purpose and character of the use—often tilts toward fair use when the use is transformative and adds new meaning, value, or function beyond the original work. In this case, Meta’s use of the books was for the transformative purpose of training an AI model, which, on the surface, aligns with transformative-use arguments.
However, the first factor is not a free pass. The court stressed that bad faith could be an indicator shaping the character of the use, potentially undermining the transformative claim if the record demonstrates a deliberate disregard for licensing and authors’ rights. The challenge here is that the law on the relevance of bad faith is unsettled and evolving. Some courts have treated bad-faith behavior as a separate, standalone consideration that can influence the outcome under the fair-use rubric; others have treated it as less decisive or even irrelevant in certain circumstances. The judge’s observations reflect this unsettled landscape, signaling that future decisions in similar cases may hinge on how courts interpret and apply bad faith in the context of AI data collection.
The second factor—nature of the copyrighted work—gives less protection to factual or informational works and more protection to creative works. In training datasets, including novels, the balance tends toward copyright protection because the works are creative artifacts. The court recognized that this factor might not strongly favor Meta, given the creative nature of the source materials. Yet, it also acknowledged that the nature factor often weighs less heavily in AI training cases than in other contexts, particularly where the use is transformative and the goal is not to supplant the market for the original work but to enable new capabilities.
The third factor—quantity and substantiality of the portion used—could be a potential area of concern if a court views the downloading as capturing the heart of the works. The court’s current posture indicates that this issue remains up for nuanced assessment, particularly in the context of a large-scale dataset assembled for training. The fourth factor—the effect on the market for the original work—remains a central concern for authors and publishers, given concerns that AI training could erode potential licensing revenues or create new competition for the original works. The court’s analysis suggests that, while the transformative purpose may mitigate this factor, the risk of market disruption remains a live question.
In this case, the judge underscored that the question of bad faith is not settled as a categorical rule and could become a pivotal determinant in future rulings. This has important implications beyond the current matter: if higher courts decide that bad faith can meaningfully affect fair-use outcomes, companies employing AI training data could face a more complex risk calculus when deciding how to assemble datasets. Conversely, if courts adopt a more permissive view of bad-faith conduct, the door could remain open for broader data-mining practices in AI training, which would necessitate a shift in licensing strategies across the publishing and technology ecosystems.
The transformative-use argument remains central. The court found that Meta’s downloading activities were inextricably linked to the training objective because the end product—Llama—was the highly transformative use that Meta sought. This connection reinforces the argument that transformative uses can support fair use when the data acquisition itself is integrated into the transformation process. Yet the court also warned that a robust fair-use defense requires careful attention to all four factors and to evolving case law, which may continue to shift in this rapidly changing legal landscape.
Torrenting, LibGen, and evidentiary hurdles
The torrenting episode’s factual landscape is unique in AI-copyright disputes. The data originated from a shadow library, LibGen, and was transferred via BitTorrent—a protocol that fragments data into pieces and shares them across a network of peers. The sheer volume of data alleged to have been downloaded and the decentralized nature of distribution create a complex evidentiary record. Discovery has not yet fully illuminated how Meta’s use of LibGen and the BitTorrent technology intersected with the company’s broader training pipeline, which in turn complicates the assessment of distribution-related harms, licensing feasibility, and potential monetization implications for both the plaintiffs and the defendants.
One of the crucial themes in the court’s order is the possibility that the torrenting activity could be “relevant in a few different ways” beyond the direct fact of copying. First, there is the bad-faith dimension discussed above. Second, the court suggested that distribution of copyrighted materials through torrenting could influence the character of the use and, by extension, the fair-use calculus. If Meta downloaded the materials in a manner that materially benefited or perpetuated pirate libraries, this could weight the analysis toward infringement or toward a more negative assessment of Meta’s conduct. Third, the court recognized the interplay between the download activity and the training use, emphasizing that the downloading serves the same transformative function as the training.
The 80.6-terabyte figure cited for the LibGen-based torrenting operation underscores the scale of the data involved. While the court did not issue a verdict on distribution based on this figure alone, the magnitude of the data transfer adds weight to the consideration that the data-collection mechanism matters in evaluating fair use, the scope of copying, and potential market effects. It also raises questions about the practicalities of licensing at such a scale. If publishers or authors could negotiate group licenses or data-use licenses that cover AI training at scale, the court’s discussion suggests that licensing markets could evolve more rapidly in the near term to accommodate demands from developers of large language models.
Another dimension concerns whether Meta’s downloading activity contributed to the functioning of pirate networks. The judge noted that it is possible evidence could emerge showing Meta supplied significant computing power or resources that aided the shadow libraries in distributing material. This line of inquiry could further complicate the analysis of the fairness of the use, particularly if it could be shown that Meta’s participation indirectly amplified unauthorized distribution.
The court also criticized the authors for anchoring their position to an outdated article that suggested torrent usage for book piracy was rare. The court highlighted that trends in e-book piracy have shifted substantially in recent years, with newer data indicating rising volumes of pirated content and suggesting that accessing pirated books online may influence sales patterns in ways that differ from earlier eras. This caveat underscores the dynamic nature of the digital-piracy ecosystem and the corresponding need for up-to-date evidence when evaluating fair-use implications.
Evidence gaps and potential future findings
Discovery in this case has revealed certain pivotal facts, but the record remains incomplete on several fronts. The authors’ ability to demonstrate that Meta’s downloading activities had a specific, measurable impact on pirate libraries or on the broader distribution network is constrained by the current state of evidence. The court acknowledged that additional discoveries could reveal that Meta’s participation in the BitTorrent ecosystem was nontrivial in ways that shift the fair-use calculus or raise questions about the distribution claim.
There is also a possibility that future evidence could demonstrate Meta’s role in providing substantial compute resources to shadow libraries or to the BitTorrent network more broadly. If it were shown that Meta’s actions materially aided these networks, it could indicate a pattern of conduct that undermines the fairness of the use, particularly in relation to the third and fourth fair-use factors. Conversely, if the additional evidence remains inconclusive, Meta’s existing fair-use defense may still stand on solid ground, given the transformative nature of the training and the broad discretion courts have historically afforded to transformative uses.
This evidentiary tension also has practical implications for how authors and publishers approach AI-data licensing in the future. The court’s reflection on discovery timelines suggests that, as the case proceeds, new pieces of information could shift the assessment of what constitutes fair use versus infringement, particularly in a landscape where AI models depend on vast swaths of copyrighted text. If publishers choose to engage more proactively in licensing discussions, it could accelerate the development of standardized data-use agreements and licensing frameworks that accommodate AI training at scale, reducing reliance on contentious fair-use arguments in future disputes.
Licensing implications for AI training and the publishing ecosystem
A recurring thread in the court’s analysis is the potential for licensing markets to emerge or expand as AI training becomes an increasingly central business activity for technology companies. The judge observed that if a broader ecosystem of licensing opportunities emerges, publishers could find themselves in a better position to monetize their rights in a way that supports innovative AI development while preserving authorial integrity and compensation. The notion of “subsidiary rights”—rights that would enable licensing for data use in AI training—appears to be an area ripe for negotiation between authors, publishers, and AI developers.
The court speculated that publishers may not currently hold all the necessary subsidiary rights to implement comprehensive group licensing arrangements, but this barrier could be resolved as market dynamics evolve. The prospect of large-scale licensing arrangements would tilt the economics of AI training toward a model where training data is assembled under clear, contractually defined terms rather than through stalemates or disputes. If licensing markets become feasible, the negotiation dynamics could favor authors who can secure favorable terms that reflect the value of their works in a data-driven training context. The decision outlines a potential pathway by which publishers and authors could attain more predictable revenue streams from AI training, reducing the transactional friction associated with obtaining rights for large-language-model development.
Additionally, the court’s commentary implies that a licensing-centric approach could influence the competitive landscape for AI developers. If license-based access to copyrighted books becomes standard, developers could pursue a strategy that emphasizes cooperative licensing, transparent data-use disclosures, and compliance with a spectrum of rights tied to the content. This would contrast with a model driven primarily by de facto data harvesting or reliance on fair-use arguments to justify broad data collection. The implication for future AI training is that licensing regimes could become a central, expected pillar in the data supply chain, with a corresponding effect on how authors’ works are valued and compensated in the digital age.
The broader industry impact could extend beyond publishers and developers to include libraries, educational institutions, and standards bodies seeking to articulate best practices for data use in AI. As the case moves toward resolution, the potential licensing guidance that emerges could inform new contractual practices, ex ante consent mechanisms, and sector-wide norms around data stewardship, privacy, and intellectual-property protection in the training of AI systems.
Discovery gaps, evidence that could shift the balance
The court’s approach to this case emphasizes the importance of continuing discovery to build a complete evidentiary record. The court’s note that the record on Meta’s distribution is incomplete signals that the parties will need to gather additional data, depositions, and perhaps expert analyses to determine the precise role of downloading in the fair-use assessment. If authors can present credible evidence showing that Meta contributed to the BitTorrent network in meaningful ways—beyond the simple act of downloading—this may substantively affect the analysis of bad faith, the distribution claim, and the fourth fair-use factor regarding market harm.
The potential for new evidence to emerge that demonstrates a financial or strategic motive behind the choice to source materials from LibGen is of particular interest. If such evidence indicates that Meta benefited, whether directly or indirectly, from pirate distribution networks, it could bolster arguments about the potential for harmful impact on the market for the original works or their licensing pathways. Conversely, if future evidence remains inconclusive or shows that the distribution did not meaningfully affect the market, the court may give greater weight to the transformative nature of the training and the overall fairness of the use.
The authors’ decision to rely on older analyses about torrent usage was criticized by the court, underscoring the need for contemporaneous, context-appropriate data in evaluating piracy trends. With the digital landscape evolving rapidly—where e-books, streaming platforms, and other digital formats have reshaped how content is accessed—the evidentiary standards for piracy-related arguments must adapt accordingly. The court’s admonition against outdated sources suggests that any future arguments about piracy prevalence should be anchored in current research and recent industry dynamics to ensure credibility and relevance.
Broader policy considerations and industry-wide effects
Beyond the specifics of this case, the judicial reasoning touches on wider policy questions about how society should balance the rights of content creators with the needs of innovators who are training AI systems. If licensing pathways become robust and scalable, the incentive structure for publishers to license works for AI training could intensify, potentially increasing the availability of rights to be used in model development. This could reduce the incentive for developers to rely on fair-use arguments that rely on the absence of licensing, especially in a context where the economic implications of AI training are not yet fully understood.
From a policy perspective, the case raises questions about how best to regulate data use in AI without stifling innovation. If fair-use defenses become less viable in the face of strong evidence of bad faith or if distribution concerns undermine the defensibility of large-scale data collection, policymakers may consider additional safeguards or clarifications. Potential options could include standardized licensing collaboratives for AI training data, clearer definitions of the permissible scope of data reuse, and explicit protections for authors’ rights in an era of automated content creation.
For authors and litigants, the evolving legal framework implies that courts may increasingly focus on the ethics and economics of data access in AI, not solely on the mechanical application of fair-use tests. The possibility of a more formalized licensing infrastructure could align incentives toward transparent data practices, better author compensation, and a more predictable regulatory environment for AI developers who rely on licensed content to power state-of-the-art models.
The strategic implications for publishers are also noteworthy. If the courts signal that licensing markets will be a central feature of AI data ecosystems, publishers could accelerate their outreach to AI developers and platforms to negotiate license terms that capture the value their works contribute to model performance. Such strategic moves could lead to a broader ecosystem where licensing becomes a common, routine part of AI development, reducing the friction and uncertainty that currently accompany data acquisitions for training purposes.
Practical outlook: next steps and potential outcomes
As the case advances toward further fact-finding, several plausible trajectories could unfold. If the plaintiffs succeed in demonstrating that Meta’s torrenting contributed to a broader distribution network or was undertaken in bad faith that meaningfully shaped the fair-use analysis, the court could revisit some aspects of the case that Meta had previously won on summary judgment. Conversely, if the new evidence remains insufficient to alter the core calculus, the court could reaffirm the summary-judgment position on infringement while providing more explicit guidance on the distribution issue, potentially reducing the risk of an adverse ruling on that score.
The most consequential near-term development may be a shift in licensing dynamics within the AI training arena. If publishers recognize that group licenses or other scalable licensing mechanisms are becoming practical and profitable, they may accelerate negotiations with AI developers. This could lead to standardized terms for dataset use, clearer compensation frameworks for authors, and improved rights management across AI platforms. The court’s observations about potential licensing markets suggest a future in which licensing agreements become not just a remedy in courtroom disputes but a foundational element of AI development.
As the litigation proceeds, observers will be watching how the court evaluates the evidence surrounding LibGen-based torrenting and how it integrates the findings into fair-use analysis. The interplay between transformative use, bad faith, and distribution will likely shape how courts address AI training data challenges in the years ahead. The outcome could set important precedents for both the technology sector and the publishing industry, clarifying the conditions under which AI developers must secure permissions and how authors can protect their rights while enabling innovation.
Conclusion
The ruling marks a significant milestone in a landmark AI copyright dispute, underscoring that the torrenting episode—while not the centerpiece of the infringement claims—could bear on the fair-use analysis, bad-faith considerations, and the overall assessment of how data used to train AI models is obtained and employed. The court’s decision to allow continued exploration of the distribution issue, despite a broad victory for Meta on other counts, signals a nuanced approach that recognizes the complexity of modern data acquisition practices and their implications for authors’ rights.
The case also highlights a broader existential question for the tech and publishing industries: as AI systems become increasingly adept at learning from vast swaths of copyrighted material, how will rights holders be compensated, and what licensing structures will govern data use at scale? The judge’s comments about potential licensing markets and the likelihood that publishers may seek group licensing arrangements reflect a future in which data rights are negotiated more openly and systematically, rather than contested through ad hoc fair-use arguments. If such licensing ecosystems mature, they could provide a clearer, more stable path for AI developers to access the diverse corpus required to train next-generation models while ensuring authors’ rights and incentives remain protected.
In the weeks ahead, the case is expected to illuminate additional facts about distribution and the operational realities of AI training data pipelines. The outcome will likely influence how courts view the relationship between the mechanics of data extraction and the ultimate transformation for model training. Regardless of the immediate verdict, the decision sets the stage for ongoing debates over fair-use boundaries, the role of bad faith in copyright analyses, and the emergence of licensing practices that could redefine the economics of AI training in the publishing ecosystem. The evolving landscape promises to reshape licensing negotiations, data governance approaches, and policy considerations for years to come.