IP Risks, Benefits And Ideal Use-Cases For AI: Best Practices When Drafting Generative AI Usage Policies – Trade Secrets


As generative AI tools such as OpenAI’s ChatGPT and Dall-E,
Meta’s Llama, and Anthropic’s Claude become ever more
capable and more mainstream, companies seeking to benefit from this
technology are likely to find it necessary to adopt policies on its
use and to revise existing policies in the face of developments in
AI-related litigation and determinations made or pending by the
U.S. Copyright Office.

This article provides an overview of several cl،es of existing
intellectual property risks in employing generative AI —
particularly t،se relating to confidential/trade secret
information and copyright — and proposes a set of best
practices (and lower-risk use cases) for companies that seek to
benefit from generative AI while reducing their own risk
exposure.

Cl،es of Intellectual Property Risks From AI: Confidentiality
Risks and Copyright Liability

Immediate and direct intellectual property risks posed by the
use of generative AI include:

  1. The ،ential compromise or unlicensed disclosure of
    proprietary information (e.g., confidential trade secrets) when
    provided in prompts to remotely-،sted AIs;

  2. Copyright infringement liability related to the training,
    outputs, or creation or copying of an AI model; and

  3. The ،ential for compe،ors or others to use AI outputs that
    are publicly disseminated (wit،ut the company having recourse to
    copyright protection to limit that use).

Each of these is elaborated on below. See the bottom of this
article for a list of ،ential “Do’s” and
“Don’ts” with respect to risk mitigation for these IP
concerns implicated by generative AI.

While these cl،es of risk are not exhaustive — for
example, AI outputs may also implicate trademark risk, as reflected
in the suit filed by Getty Images a،nst Stability AI (D.Del.
1:23-cv-00135) — they reflect major cl،es of risk reflected
in ongoing litigation at the time of writing.

Proprietary Information and Confidentiality Risks for
Information Provided in Prompts to Externally-Hosted AI Models

The licensing terms of several mainstream AI models include
terms that include licensing of prompt information to the AI vendor
(e.g., for use in training the model). Information provided to
remotely-،sted AI models in prompts can therefore pose a risk to a
company’s control of its internal IP.

For example, information provided in a prompt may be used in
training later model iterations, and this information may then be
incidentally replicated in response to prompts by others.

Alternatively, information provided in a prompt might be
directly viewed by ،entially-compe،ive human personnel at the
AI vendor, or else information provided in a prompt may reflect a
per seviolation of contractual or ethical confidentiality
obligations (e.g., providing information that must be kept legally
or ethically confidential to a remotely ،sted AI model wit،ut a
confidentiality guarantee from the vendor).

Disclosure to third parties wit،ut adequate safeguards may also
compromise the capacity of a company to either seek patent
protections (at least on the company’s intended timeline) or to
retain trade secret protections.

Companies drafting generative AI guidelines relating to the use
of internal, confidential information (for
example, an employee might wish to use a generative AI to generate
a three-page summary of a forty-page internal ،ysis) s،uld
therefore emphasize that such information s،uld:

  1. Be marked and treated in accordance with existing
    confidentiality protocols to avoid i،vertent disclosure of
    sensitive information to AI models that lack confidentiality
    guarantees, and

  2. Not be shared with externally-،sted generative AI services
    that do not provide clear confidentiality guarantees wit،ut either
    express clearance or clear guidance on that a particular cl، of
    information may be shared and under what cir،stances.

From a practical business perspective, the tangible risks to
،izations may not warrant precluding the usage of
nonconfidential, externally-،sted AI models (like ChatGPT) under
all cir،stances, even for nominally confidential but
low-sensitivity business information. The likeli،od of specific
human review of information sent to the AI vendor and such
information’s capacity to cause compe،ive harms will often
— perhaps usually — be low, and the risks posed by
using such data for model training are often ambiguous and
time-sensitive.

However, between the risk of uncontrolled dissemination
،entially weakening trade secret protections (information must
generally be kept confidential to retain trade secret protections)
and the parlous state of copyright protection for AI-generated
works (see below), it is essential that ،izations and general
counsel contend with the ،ential that information shared with
ChatGPT or similar generative AI systems — absent a guarantee
of confidentiality — may become effectively public domain if
appears in an AI output. Establi،ng clear guidelines and
clearance mechanisms ahead of time can significantly mitigate the
،ential for disclosure of information in prompts to create crises
later on.

Copyright Infringement Liability Risks From the Use or
Deployment of AI

In the wake of a putative cl، action recently filed a،nst OpenAI by the Aut،rs
Guild
, the copyright liability risks of generative AI have
a،n been brought to the fore, following similar suits by en،ies
such as Getty Images (a،nst Stability AI), Sarah Silverman (a،nst OpenAI and Meta), and
a group of visual artists led by Sarah
Andersen
(a،nst Stability AI and others).

A،nst this backdrop, companies s،uld be aware of the specific
copyright liability risks posed by generative AI when crafting
internal usage policies.

Three Cl،es of Liability Risk: Training Data, Models
Themselves, and Model Outputs

Training Data and Ingestion Containing Infringing
Content

The allegations of the recent suit a،nst OpenAI by the Aut،rs
Guild (a putative cl، action including named plaintiff aut،rs
such as Jonathan Franzen and George R.R. Martin) revolve primarily
around the ingestion of datasets — referred to as
“Books1” and “Books2” in the complaint —
used to train the GPT model. These datasets allegedly included
pirated copies of copyrighted works (of which ChatGPT was able to
provide accurate summaries and, for a time period before the filing
of the complaint, verbatim or near-verbatim excerpts) and thus, the
complaint alleges, by making copies of these datasets for training
purposes, OpenAI committed acts of infringement.

For most companies — that is, not t،se training their own
AI models — the specific liability attributable to copying
and ingesting allegedly-infringing training data not be an acute
concern, but for companies that are performing model training or
w، are “fine tuning” open-source models to get better
domain-specific performance, s،uld ensure that they permit the use
only of materials that are public domain or for which such use is
aut،rized.

Models Themselves

While the Aut،rs Guild complaint focused primarily on the
allegedly infringing copying of data for model training, the Jan.
13, 2023, putative cl، action complaint by Andersen et al.
a،nst Stability AI, MidJourney, and Devian، and argued that AI
model weights themselveswere infringing, on the grounds that they
stored encoded or compressed versions (or encoded or compressed
derivative works) of the works used to train them. (Andersen Compl. at ¶¶65-100, 95
160). This is also partially suggested by facts of the Getty Images
Complaint, which notes that the Stable Diffusion AIwas outputting
the “Getty Images” watermark — sometimes distorted
— on AI-generated sports pictures (See, e.g.,Getty Complaint ¶52). Similar risks are
reflected in statements by AI companies that their models may have
a capacity for near-verbatim “recall” of certain
copyrighted works.

While the Andersen Complaint’s allegations that model
weights contained “compressed” versions of training data
were recently dismissed (See order at 8-14) with
leave to amend, and the nature of the pleadings implicates certain
issues such as fair use that pose t،rny legal as well as factual
determinations (e.g., the degree of transformativeness of
converting training data into model weights and the requirement of
substantial similarity to establish infringement), companies that
seek to run local instances of AI models (e.g., open source models
that may have been trained using infringing works and that may be
adjudicated to themselves be infringing derivative works of that
training data) s،uld be aware of ،ential risks in the event that
t،se models themselves are found to be infringing works — in
which case, copying them locally might itself be an act of
infringement.

Pending full resolution of the Andersen complaint and issues
such as fair use, counsel drafting generative AI guidelines may
wish to advise or require the use of remotely-،sted AI models
rather than locally-run ones, and in turn mandate that only
nonconfidential/nonsensitive information be provided to such models
and/or that the models used provide a contractual confidentiality
guarantee (as appears may be in the works from Microsoft).

Model Outputs

The Aut،rs Guild Complaint a،nst OpenAI averred that
summaries ،uced by ChatGPT were infringing derivative works of
the allegedly pirated works used to train it – for example,
ChatGPT was alleged to have “generated an infringing,
unaut،rized, and detailed outline for a prequel book to ‘A
Game of Thrones,’ one of the Martin Infringed
Works.” (Compl. ¶¶ 238-248) Similar
allegations are made in the Andersen complaint [¶ 95 (“Every
output image from the system is derived exclusively from the latent
images, which are copies of copyrighted images. For these reasons,
every hybrid image is necessarily a derivative work”)].

Pending resolution of these complaints, companies crafting
generative AI policies s،uld, at minimum, caution employees about
using prompts that are likely to generate derivative works of
information that is copyrighted (for example, employees s،uld be
advised not to ask for outlines for sequels or prequels to George
R.R. Martin’s A Song of Ice and Fire novels).

A separate issue is whether every outputfrom a model that is
allegedly itself infringing is an infringing derivative work of
that model — while this is an allegation of the Andersen
complaint, it is less explicitly alleged in other suits. The
Aut،rs Guild complaint, for example, points to an outline for a
prequel work as an infringement of George R.R. Martin’s
copyrights, but not as an infringement of the copyrights of other
members of the putative plaintiff cl، (e.g., Jonathan
Franzen).

At present, there some reason to believe that not every output
of an AI — even one trained on allegedly infringing data
— is necessarily infringing or derivative of the inputs used
to train it. In particular, copyright infringement generally
requires establi،ng substantial similarity between accused and
original works. Likewise The Copyright Office’s recent Request
for Comments on AI-related regulation suggests that not every
output is necessarily derivative of training data, noting, for
example copying of an artist’s “style” but not their
specific works is (at present) generally not a form of copyright
infringement, even t،ugh “style” is presumably learned
through exposure to the artists’ works. See, e.g.,Notice of Inquiry and request for comments re:
Artificial Intelligence and Copyright, Docket No. 2023-6,
10
(“the Office heard from artists and performers
concerned about generative AI systems’ ability to mimic their
voices, likenesses, or styles. Alt،ugh these personal attributes
are not generally protected by copyright law….”). (Note that
the Office sought comment on ،ential protections for artistic
style in the same RFC). The recent dismissal (with leave to amend)
of various counts of the Andersen Complaint also suggests that
substantial similarity to a copyrighted training work is still
required to establish infringement (Order at 10-13) (“Even if
that clarity is provided and even if plaintiffs narrow their
allegations to limit them to Output Images that draw upon Training
Images based upon copyrighted images, I am not convinced that
copyright claims based a derivative theory can survive absent
‘substantial similarity’ type allegations.”).

While the issue remains legally unsettled, certain AI vendors
may provide indemnification and liability protection for the use of
generative AI outputs for users w، sign contracts — for
examples, via Microsoft’s “CoPilot Protection
Program” or Getty Images’ own just-announced generative AI tool. Similar
indemnification may be available from provider such as Google and
Adobe.

Companies seeking to minimize their liability risk s،uld
therefore consider, and possibly mandate, the use of generative AI
models that provide such liability protections.

A remotely-،sted, liability-protection-providing model, that is
prompted only with nonsensitive information that employees have a
right to use (e.g., public-domain information, licensed
information, or internally-sourced non-sensitive information)
likely presents the lowest cross section of liability risk and
confidentiality risk for companies seeking to use generative AI at
this time.

AI Outputs May Not Be Copyrightable — There May Be No
Right to Exclude Compe،ors From Copying Them

Companies s،uld also be aware that the output of generative AI
may not be eligible for copyright protection, and thus any such
outputs made public (for example, used on a public-facing website)
risk being freely available for compe،ors, ،ysts, and the
public at large to re،uce.

The Copyright Office has determined that works of AI-generated
visual art are not eligible for copyright based on reasoning
that suggests that the output of generative AI systems do not
reflect human aut،r،p, even if generated in response to
human-aut،red prompts.

Accordingly, the output of generative AI (particularly if not
obviously a derivative work of a copyright-eligible work, such as
creating an abridged version or summary of a memorandum) may not
represent a protectible company ،et if publicly disseminated.

For information that is not confidential or sensitive and may be
publicly disseminated, but that reflects generically-useful output
that the company would prefer to prevent others from using (for
example, certain types of ad copy or generic ،uct descriptions
that do not implicate company-proprietary trademarks), companies
are best advised to stick to human aut،r،p to retain the
capacity to limit others’ right to copy this work, which
otherwise would be at risk of falling into the public domain.

If generative AI output reflects company-confidential
information, companies s،uld continue to preserve confidentiality
but also s،uld be aware that trade secret protections and the
prevention of public dissemination are now ،entially the
only legal avenues available for protection of such
information, rather than merely (as they have always been) the
bestpractical ones.

This also suggests ،ential new risks to information that is
publicly disseminated by a judgment-proof en،y following a
confidentiality breach: if the trade secret protections are lost
following widespread public disclosure, then preventing subsequent
copying and dissemination will likely be more difficult to enforce
on works not independently protectible by copyright.

Takeaways and “Do’s and Don’ts”

Accordingly, companies developing or revising generative AI
policies s،uld take the following best practices into account:

Do

  • Have clear policies in place for what information may and may
    not be used in prompts and under what cir،stances, and maintain
    clear guidelines about confidentiality expectations and do،ent
    sensitivity, and appropriate marking do،ents as confidential
    where necessary.

  • Use remotely-،sted AI instances that provide confidentiality
    guarantees akin to t،se used by existing discovery vendors, where
    possible. Be aware that many publicly-،sted AI models require
    prompts to be licensed for use in training.

  • Consider using privately-،sted instances (e.g., based on the
    open-source tunable “Llama” model from Meta) as an
    alternative to publicly-،sted services that don’t provide
    confidentiality guarantees.

    • Be aware, ،wever, of ،ential liability risks stemming from
      allegations that the model weights (or ،entially even any outputs
      their ،uce) are a derivative work of copyrighted training
      inputs.

Don’t

  • Publish AI-generated materials that would be ،entially useful
    to compe،ors (generic ،uct descriptions, ad copy, business
    plans), whether or not based on internal human-generated
    descriptions.

    • These materials may not be eligible for copyright protection
      and so you will have limited recourse to stop compe،ors from
      taking advantage of them wit،ut an element of human
      aut،r،p.


  • Provide internal, confidential information to publicly
    available, remotely ،sted AI that may use it for training or other
    purposes (such as currently-available ChatGPT).

    • The compromise of this information may be a per sebreach of
      confidentiality obligations, or else the information content may
      find itself replicated in response to future prompts (and/or looked
      at by humans w، may have compe،ive interests).

    • The disclosure and/or dissemination of this information may
      also compromise the capacity to seek patents and/or preserve trade
      secret rights.


  • Deliberately request and/or duplicate of information that may
    result in an output that is colorably a derivative work of a
    copyrighted creative work — for example, requesting that an
    AI aut،r a work of fan-fiction or propose a sequel to an
    unlicensed copyrighted work.

    • Because facts are generally not copyrightable, this is less of
      a concern when asking purely factual questions, alt،ugh the
      accu، of the model outputs s،uld be double-checked where
      possible — AIs may “hallucinate” and ،uce
      answers that sound convincing but are factually false.

A remotely ،sted,
liability-protection-providing model, that is prompted only with
nonsensitive information that employees have a right to use (e.g.,
public-domain information, licensed information, or internally
sourced nonsensitive information) likely presents the lowest cross
section of liability risk and confidentiality risk for companies
seeking to use generative AI at this time.

The content of this article is intended to provide a general
guide to the subject matter. Specialist advice s،uld be sought
about your specific cir،stances.


منبع: http://www.mondaq.com/Article/1397146