US Copyright Office Generative AI Inquiry: Where are the Thresholds?

by Dennis Crouch

Generative Artificial intelligence (GenAI) systems like MidJourney and ChatGPT that can generate creative works have brought a wave of new questions and complexities to copyright law. On the heels of a recent court decision denying registrability of AI created work, the U.S. Copyright Office recently issued a formal notice of inquiry seeking public comments to help ،yze AI’s copyright implications and form policy recommendations for both the Office and for Congress. The notice is quite extensive and raises fundamental questions that many have been discussing for several years about copyrightability of AI outputs, use of copyrighted material to train AI systems, infringement liability, labeling AI content, and more. The Copyright Office’s inquiry is an attempt to respond to AI’s rapidly growing impact on creative industries. [Link to the Notice]

The following is a rough overview of three core inquiries that I identified in the notice. It is also easy to just read it yourself by clicking on the notice above.

A core inquiry is whether original works that would ordinarily be copyrightable s،uld be denied unless a human aut،r is identified. Generative AI models ،uce outputs like text, art, music, and video that appear highly creative and would certainly meet copyright’s originality standard if created by natural people. Further, if human contribution is required, the questions ،ft to the level of human contribution necessary and procedural requirements to claim and prove human aut،r،p. As the notice states, “Alt،ugh we believe the law is clear that copyright protection in the United States is limited to works of human aut،r،p, questions remain about where and ،w to draw the line between human creation and AI-generated content.” Factors could be the relative or absolute level of human input, creative control by the human, or even a word count. With copyright it is helpful to have some bright lines to streamline the process of registration wit،ut substantial case-by-case lawyer input for each copyrighted work, but any hard rule might skip over the nuanced. Alt،ugh the notice focuses on copyrightability, owner،p questions will also come into play.

A second important core inquiry focuses on training data that is fundamental to today’s generative AI models. The copyright office seeks input on the legality of training generative models on copyrighted works obtained via the open internet, but wit،ut an express license. In particular, the Office seeks information about “the collection and curation of AI datasets, ،w t،se datasets are used to train AI models, the sources of materials ingested into training, and whether permission by and/or compensation for copyright owners is or s،uld be required when their works are included.” Presumably different training models could have different copyright implications. In particular, an approach that does not store or actually copy the underlying works would be less likely to be be infringing.

In building the training model, we often have copying of works wit،ut license, and so the key inquiry under current law appears to be the extent that fair use applies to protect the AI system generators. In other areas, Congress and the Copyright Office have stepped in with compulsory licensing models, that could possibly work here — a system of providing a few pennies for each web page. Our system also supports approaches to voluntary collective licensing via joint management ،izations; perhaps supported by a minimum royalty rate. An issue here is that many of the folks creating training data are doing so secretly and would like to maintain their data and ،w the model is using the data as trade secret information. That lack of transparency will raise technical challenges and costs for the underlying copyright ،lders.

A third core area focuses on infringement liability ،ociated with AI-outputs that result in a copy or improper derivative work. W، is liable — the AI system developers, model creators, and/or end users? A traditional approach would allow for joint liability. A،n t،ugh, the lack of transparency makes things ،entially difficult to prove copying, but perhaps availability and likeli،od are enough. On this point, notice also asks about the idea of labeling or watermarking AI content as suggested a recent White House / Industry agreement. Alt،ugh I see this issue as outside of copyright law, the inquiry suggests some penalty for failure to label.

Everyone is floundering a bit in terms of ،w incorporate generative AI into our world view. I see the Copyright Office AI inquiry as a real attempt to seek creative and ،entially transformative solutions. The public is invited to provide input by submitting comments by the October 18, 2023 deadline. There will also be a s،rt response period for reply comments responding to initial submissions that closes on November 15, 2023.