43 | AI Risk Primer - Copyright

Dec 05, 2023

In this series on AI risks, first exploring AI “hallucinations” and then data privacy, I’m next turning to issues related to copyright.

Reflecting on the rapid evolution of AI, it's clear that even a month ago, the landscape looked different. I realize now the importance of including copyright as a notable risk area for AI use in nonprofit organizations, particularly as these tools become more integral to our operations. In particular, at a series of talks I’ve given to nonprofit audiences, I was questioned repeatedly about this topic. So for this article, I’m going to use a Q&A approach to share the notable risks you might face if you turn to AI tools in your work.

Is it possible that I could violate some author’s copyright by using AI to help with my writing?

In a word, yes.

In two words, yes but…1

Let’s take this scenario. You’ve written a paragraph in a grant application describing the outcomes you anticipate from the program to be funded. You ask an AI tool (ChatGPT, Bard, Jasper, Google Duet, Microsoft Copilot, Notion AI, etc. — there are lots!) to rewrite the paragraph, perhaps in a more conversational tone. It is technically possible but extremely, extremely unlikely that the AI tool will include phrases or sentences that pull directly from copyright text that was used in the AI training.

Or this scenario. You’re preparing an article for your organization’s annual report and you ask an AI tool (again, any of ‘em) to write a few paragraphs about a fictional family experiencing some notable circumstance. Again, it is technically possible but extremely2 unlikely that the text generated would violate copyright.

Questions you should ask yourself in similar circumstances:

Is the written material going to be distributed/copied/posted? If so, both legal and ethical considerations apply. If not, then there may not be legal implications, but reusing copyrighted material without citation may violate your organization’s ethics. The first scenario may not have legal exposure but certainly should be considered from an ethical standpoint. In the second scenario, it’s more clear that there may be legal implications.
Are you going to use the AI-generated text directly, or will you rewrite it notably to make it your own? This should place you in more familiar territory. If you treat the AI-generated text similar to published content you might use for inspiration but you write the narrative in your own words, you can avoid violating the letter and spirit of copyright law. In both scenarios, treating the AI-generated text similarly, whether it was ‘rewording’ your original writing or it was generating something fresh, should place you on safe ground.
Can I check to see if what the AI-generated exists? Unlikely. You can copy the text and search for it (e.g., using Google), but this is generally not fruitful, as search engines are likely to return various articles that have similar text.

Can we be sued if we accidentally publish something that’s copyrighted?

In a word, maybe.3

An emerging trend among some AI tool vendors is the offer of legal indemnification for users. This means if a copyright infringement claim arises from the use of their tool, the vendor may cover legal expenses. However, it's important to note that these policies vary and are subject to change, so I view this as largely irrelevant: If you don’t publish anything that’s copyrighted, you don’t need to be concerned about being sued.

Do the AI tool models store the original content they used in training?

No, not really. While AI tools don't store original content used in their training, understanding their operation is key. They utilize complex algorithms and mathematical models to generate new, unique content. This process, based on patterns and data from their training, creates outputs that are original in composition, though influenced by a vast array of sources.

That said, concerning data privacy related to the content you upload, it is still extremely unlikely that the exact content will be disclosed, but not impossible. (Refer to my article about AI Risk and Data Privacy for a slightly longer description.)

How do we use the image generators without violating artists’ copyright?

Image courtesy DALL-E 3. Prompt: image of a woman on the beach walking with a young girl *in open impressionistic style*

Generation of images using AI tools like DALL-E, Midjourney, and Stable Diffusion is an area of great controversy regarding copyright. (Music is a second charged topic with similar considerations.) Here, it is almost impossible that the AI tool will generate an image that is identical to an artist’s work. However, the application of copyright in this space is more clearly inspired by both the exact work and the style in which it was created.

The best approach to avoid encroaching on copyright is not to use reference to an individual artist in your prompting. For example, “image of a woman on the beach walking with a young girl in the style of Erin Hanson” cites a living artist and may generate an image that’s a bit too close to her work. But “image of a woman on the beach walking with a young girl in open impressionistic style” might give you similar results without taking inspiration from an artist whose works are copyrighted.

The Bottom Line

As with many aspects of using AI, simply being sure to use common sense—which may involve slowing down a bit before just pasting and submitting/sending!—can keep you out of either legal or ethical trouble.

Using new technology like AI can be a great reminder to pay attention to some of the basics that software can allow you to forget about, like citing sources and avoiding the reuse of someone’s work.