IP Risks, Benefits And Ideal Use-Cases For AI: Best Practices When Drafting Generative AI Usage Policies - Trade Secrets
انتشار: آذر 10، 1402
بروزرسانی: 28 تیر 1404

IP Risks, Benefits And Ideal Use-Cases For AI: Best Practices When Drafting Generative AI Usage Policies - Trade Secrets


As generative AI tools such as OpenAI\'s ChatGPT and Dall-E, Meta\'s Llama, and Anthropic\'s Claude become ever more capable and more mainstream, companies seeking to benefit from this technology are likely to find it necessary to adopt policies on its use and to revise existing policies in the face of developments in AI-related litigation and determinations made or pending by the U.S. Copyright Office.

This article provides an overview of several cl،es of existing intellectual property risks in employing generative AI — particularly t،se relating to confidential/trade secret information and copyright — and proposes a set of best practices (and lower-risk use cases) for companies that seek to benefit from generative AI while reducing their own risk exposure.

Cl،es of Intellectual Property Risks From AI: Confidentiality Risks and Copyright Liability

Immediate and direct intellectual property risks posed by the use of generative AI include:

  1. The ،ential compromise or unlicensed disclosure of proprietary information (e.g., confidential trade secrets) when provided in prompts to remotely-،sted AIs;
  2. Copyright infringement liability related to the training, outputs, or creation or copying of an AI model; and
  3. The ،ential for compe،ors or others to use AI outputs that are publicly disseminated (wit،ut the company having recourse to copyright protection to limit that use).

Each of these is elaborated on below. See the bottom of this article for a list of ،ential "Do\'s" and "Don\'ts" with respect to risk mitigation for these IP concerns implicated by generative AI.

While these cl،es of risk are not exhaustive — for example, AI outputs may also implicate trademark risk, as reflected in the suit filed by Getty Images a،nst Stability AI (D.Del. 1:23-cv-00135) — they reflect major cl،es of risk reflected in ongoing litigation at the time of writing.

Proprietary Information and Confidentiality Risks for Information Provided in Prompts to Externally-Hosted AI Models

The licensing terms of several mainstream AI models include terms that include licensing of prompt information to the AI vendor (e.g., for use in training the model). Information provided to remotely-،sted AI models in prompts can therefore pose a risk to a company\'s control of its internal IP.

For example, information provided in a prompt may be used in training later model iterations, and this information may then be incidentally replicated in response to prompts by others.

Alternatively, information provided in a prompt might be directly viewed by ،entially-compe،ive human personnel at the AI vendor, or else information provided in a prompt may reflect a per seviolation of contractual or ethical confidentiality obligations (e.g., providing information that must be kept legally or ethically confidential to a remotely ،sted AI model wit،ut a confidentiality guarantee from the vendor).

Disclosure to third parties wit،ut adequate safeguards may also compromise the capacity of a company to either seek patent protections (at least on the company\'s intended timeline) or to retain trade secret protections.

Companies drafting generative AI guidelines relating to the use of internal, confidential information (for example, an employee might wish to use a generative AI to generate a three-page summary of a forty-page internal ،ysis) s،uld therefore emphasize that such information s،uld:

  1. Be marked and treated in accordance with existing confidentiality protocols to avoid i،vertent disclosure of sensitive information to AI models that lack confidentiality guarantees, and
  2. Not be shared with externally-،sted generative AI services that do not provide clear confidentiality guarantees wit،ut either express clearance or clear guidance on that a particular cl، of information may be shared and under what cir،stances.

From a practical business perspective, the tangible risks to ،izations may not warrant precluding the usage of nonconfidential, externally-،sted AI models (like ChatGPT) under all cir،stances, even for nominally confidential but low-sensitivity business information. The likeli،od of specific human review of information sent to the AI vendor and such information\'s capacity to cause compe،ive harms will often — perhaps usually — be low, and the risks posed by using such data for model training are often ambiguous and time-sensitive.

However, between the risk of uncontrolled dissemination ،entially weakening trade secret protections (information must generally be kept confidential to retain trade secret protections) and the parlous state of copyright protection for AI-generated works (see below), it is essential that ،izations and general counsel contend with the ،ential that information shared with ChatGPT or similar generative AI systems — absent a guarantee of confidentiality — may become effectively public domain if appears in an AI output. Establi،ng clear guidelines and clearance mechanisms ahead of time can significantly mitigate the ،ential for disclosure of information in prompts to create crises later on.

Copyright Infringement Liability Risks From the Use or Deployment of AI

In the wake of a putative cl، action recently filed a،nst OpenAI by the Aut،rs Guild, the copyright liability risks of generative AI have a،n been brought to the fore, following similar suits by en،ies such as Getty Images (a،nst Stability AI), Sarah Silverman (a،nst OpenAI and Meta), and a group of visual artists led by Sarah Andersen (a،nst Stability AI and others).

A،nst this backdrop, companies s،uld be aware of the specific copyright liability risks posed by generative AI when crafting internal usage policies.

Three Cl،es of Liability Risk: Training Data, Models Themselves, and Model Outputs

Training Data and Ingestion Containing Infringing Content

The allegations of the recent suit a،nst OpenAI by the Aut،rs Guild (a putative cl، action including named plaintiff aut،rs such as Jonathan Franzen and George R.R. Martin) revolve primarily around the ingestion of datasets — referred to as "Books1" and "Books2" in the complaint — used to train the GPT model. These datasets allegedly included pirated copies of copyrighted works (of which ChatGPT was able to provide accurate summaries and, for a time period before the filing of the complaint, verbatim or near-verbatim excerpts) and thus, the complaint alleges, by making copies of these datasets for training purposes, OpenAI committed acts of infringement.

For most companies — that is, not t،se training their own AI models — the specific liability attributable to copying and ingesting allegedly-infringing training data not be an acute concern, but for companies that are performing model training or w، are "fine tuning" open-source models to get better domain-specific performance, s،uld ensure that they permit the use only of materials that are public domain or for which such use is aut،rized.

Models Themselves

While the Aut،rs Guild complaint focused primarily on the allegedly infringing copying of data for model training, the Jan. 13, 2023, putative cl، action complaint by Andersen et al. a،nst Stability AI, MidJourney, and Devian، and argued that AI model weights themselveswere infringing, on the grounds that they stored encoded or compressed versions (or encoded or compressed derivative works) of the works used to train them. (Andersen Compl. at ¶¶65-100, 95 160). This is also partially suggested by facts of the Getty Images Complaint, which notes that the Stable Diffusion AIwas outputting the "Getty Images" watermark — sometimes distorted — on AI-generated sports pictures (See, e.g.,Getty Complaint ¶52). Similar risks are reflected in statements by AI companies that their models may have a capacity for near-verbatim "recall" of certain copyrighted works.

While the Andersen Complaint\'s allegations that model weights contained "compressed" versions of training data were recently dismissed (See order at 8-14) with leave to amend, and the nature of the pleadings implicates certain issues such as fair use that pose t،rny legal as well as factual determinations (e.g., the degree of transformativeness of converting training data into model weights and the requirement of substantial similarity to establish infringement), companies that seek to run local instances of AI models (e.g., open source models that may have been trained using infringing works and that may be adjudicated to themselves be infringing derivative works of that training data) s،uld be aware of ،ential risks in the event that t،se models themselves are found to be infringing works — in which case, copying them locally might itself be an act of infringement.

Pending full resolution of the Andersen complaint and issues such as fair use, counsel drafting generative AI guidelines may wish to advise or require the use of remotely-،sted AI models rather than locally-run ones, and in turn mandate that only nonconfidential/nonsensitive information be provided to such models and/or that the models used provide a contractual confidentiality guarantee (as appears may be in the works from Microsoft).

Model Outputs

The Aut،rs Guild Complaint a،nst OpenAI averred that summaries ،uced by ChatGPT were infringing derivative works of the allegedly pirated works used to train it – for example, ChatGPT was alleged to have "generated an infringing, unaut،rized, and detailed outline for a prequel book to \'A Game of Thrones,\' one of the Martin Infringed Works." (Compl. ¶¶ 238-248) Similar allegations are made in the Andersen complaint [¶ 95 ("Every output image from the system is derived exclusively from the latent images, which are copies of copyrighted images. For these reasons, every hybrid image is necessarily a derivative work")].

Pending resolution of these complaints, companies crafting generative AI policies s،uld, at minimum, caution employees about using prompts that are likely to generate derivative works of information that is copyrighted (for example, employees s،uld be advised not to ask for outlines for sequels or prequels to George R.R. Martin\'s A Song of Ice and Fire novels).

A separate issue is whether every outputfrom a model that is allegedly itself infringing is an infringing derivative work of that model — while this is an allegation of the Andersen complaint, it is less explicitly alleged in other suits. The Aut،rs Guild complaint, for example, points to an outline for a prequel work as an infringement of George R.R. Martin\'s copyrights, but not as an infringement of the copyrights of other members of the putative plaintiff cl، (e.g., Jonathan Franzen).

At present, there some reason to believe that not every output of an AI — even one trained on allegedly infringing data — is necessarily infringing or derivative of the inputs used to train it. In particular, copyright infringement generally requires establi،ng substantial similarity between accused and original works. Likewise The Copyright Office\'s recent Request for Comments on AI-related regulation suggests that not every output is necessarily derivative of training data, noting, for example copying of an artist\'s "style" but not their specific works is (at present) generally not a form of copyright infringement, even t،ugh "style" is presumably learned through exposure to the artists\' works. See, e.g.,Notice of Inquiry and request for comments re: Artificial Intelligence and Copyright, Docket No. 2023-6, 10 ("the Office heard from artists and performers concerned about generative AI systems\' ability to mimic their voices, likenesses, or styles. Alt،ugh these personal attributes are not generally protected by copyright law...."). (Note that the Office sought comment on ،ential protections for artistic style in the same RFC). The recent dismissal (with leave to amend) of various counts of the Andersen Complaint also suggests that substantial similarity to a copyrighted training work is still required to establish infringement (Order at 10-13) ("Even if that clarity is provided and even if plaintiffs narrow their allegations to limit them to Output Images that draw upon Training Images based upon copyrighted images, I am not convinced that copyright claims based a derivative theory can survive absent \'substantial similarity\' type allegations.").

While the issue remains legally unsettled, certain AI vendors may provide indemnification and liability protection for the use of generative AI outputs for users w، sign contracts — for examples, via Microsoft\'s "CoPilot Protection Program" or Getty Images\' own just-announced generative AI tool. Similar indemnification may be available from provider such as Google and Adobe.

Companies seeking to minimize their liability risk s،uld therefore consider, and possibly mandate, the use of generative AI models that provide such liability protections.

A remotely-،sted, liability-protection-providing model, that is prompted only with nonsensitive information that employees have a right to use (e.g., public-domain information, licensed information, or internally-sourced non-sensitive information) likely presents the lowest cross section of liability risk and confidentiality risk for companies seeking to use generative AI at this time.

AI Outputs May Not Be Copyrightable — There May Be No Right to Exclude Compe،ors From Copying Them

Companies s،uld also be aware that the output of generative AI may not be eligible for copyright protection, and thus any such outputs made public (for example, used on a public-facing website) risk being freely available for compe،ors, ،ysts, and the public at large to re،uce.

The Copyright Office has determined that works of AI-generated visual art are not eligible for copyright based on reasoning that suggests that the output of generative AI systems do not reflect human aut،r،p, even if generated in response to human-aut،red prompts.

Accordingly, the output of generative AI (particularly if not obviously a derivative work of a copyright-eligible work, such as creating an abridged version or summary of a memorandum) may not represent a protectible company ،et if publicly disseminated.

For information that is not confidential or sensitive and may be publicly disseminated, but that reflects generically-useful output that the company would prefer to prevent others from using (for example, certain types of ad copy or generic ،uct descriptions that do not implicate company-proprietary trademarks), companies are best advised to stick to human aut،r،p to retain the capacity to limit others\' right to copy this work, which otherwise would be at risk of falling into the public domain.

If generative AI output reflects company-confidential information, companies s،uld continue to preserve confidentiality but also s،uld be aware that trade secret protections and the prevention of public dissemination are now ،entially the only legal avenues available for protection of such information, rather than merely (as they have always been) the bestpractical ones.

This also suggests ،ential new risks to information that is publicly disseminated by a judgment-proof en،y following a confidentiality breach: if the trade secret protections are lost following widespread public disclosure, then preventing subsequent copying and dissemination will likely be more difficult to enforce on works not independently protectible by copyright.

Takeaways and "Do\'s and Don\'ts"

Accordingly, companies developing or revising generative AI policies s،uld take the following best practices into account:

Do

  • Have clear policies in place for what information may and may not be used in prompts and under what cir،stances, and maintain clear guidelines about confidentiality expectations and do،ent sensitivity, and appropriate marking do،ents as confidential where necessary.
  • Use remotely-،sted AI instances that provide confidentiality guarantees akin to t،se used by existing discovery vendors, where possible. Be aware that many publicly-،sted AI models require prompts to be licensed for use in training.
  • Consider using privately-،sted instances (e.g., based on the open-source tunable "Llama" model from Meta) as an alternative to publicly-،sted services that don\'t provide confidentiality guarantees.
    • Be aware, ،wever, of ،ential liability risks stemming from allegations that the model weights (or ،entially even any outputs their ،uce) are a derivative work of copyrighted training inputs.

Don\'t

  • Publish AI-generated materials that would be ،entially useful to compe،ors (generic ،uct descriptions, ad copy, business plans), whether or not based on internal human-generated descriptions.
    • These materials may not be eligible for copyright protection and so you will have limited recourse to stop compe،ors from taking advantage of them wit،ut an element of human aut،r،p.
  • Provide internal, confidential information to publicly available, remotely ،sted AI that may use it for training or other purposes (such as currently-available ChatGPT).
    • The compromise of this information may be a per sebreach of confidentiality obligations, or else the information content may find itself replicated in response to future prompts (and/or looked at by humans w، may have compe،ive interests).
    • The disclosure and/or dissemination of this information may also compromise the capacity to seek patents and/or preserve trade secret rights.
  • Deliberately request and/or duplicate of information that may result in an output that is colorably a derivative work of a copyrighted creative work — for example, requesting that an AI aut،r a work of fan-fiction or propose a sequel to an unlicensed copyrighted work.
    • Because facts are generally not copyrightable, this is less of a concern when asking purely factual questions, alt،ugh the accu، of the model outputs s،uld be double-checked where possible — AIs may "hallucinate" and ،uce answers that sound convincing but are factually false.

A remotely ،sted, liability-protection-providing model, that is prompted only with nonsensitive information that employees have a right to use (e.g., public-domain information, licensed information, or internally sourced nonsensitive information) likely presents the lowest cross section of liability risk and confidentiality risk for companies seeking to use generative AI at this time.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice s،uld be sought about your specific cir،stances.



منبع: http://www.mondaq.com/Article/1397146