42 | AI Risk Primer — Data Privacy

Dec 05, 2023

For the next topic in this series about AI risk, I’d like to explore considerations related to data privacy. It’s something that I’ve found troubling over the last few months, as I’ve been trying to understand the considerations (what should we be worrying about), the technical risks (what plausibly could happen), and the practical implications (what shouldn’t we do).

Data Privacy and Data Security

Two terms—data privacy and data security—are used to describe the various ways that private or confidential information might fall into the hands of someone else who’s not supposed to access it. You can learn lots about the difference between these two through a quick internet search (or prompt to an AI chatbot). To summarize in my layman’s interpretation:

Data Privacy refers to the ways that you might do something to expose confidential data
Data Security refers to the way that someone else might take action to get at confidential data

Data security protections generally involve using technology to protect the devices you use and the software systems that store or transmit data. Approaches such as virus protection and data encryption are standard in most organizations. If yours doesn’t have a cybersecurity protocol, now’s not too late to raise the risk and get one going!

When it comes to AI systems, whether standalone apps (e.g., ChatGPT, Midjourney, Jasper) or AI capabilities added to software applications (e.g., Magic Studio in Canva, various AI writing assistants), there are few or no new data security considerations. Basically, the ways that bad actors might try to get at your data in a traditional database or cloud storage app are the same ways they might dig into your data stored in an AI system. These AI systems are hosted on the servers at Microsoft, Amazon, Google, or in corporate data warehouses with the same level of protection (or vulnerability!) as familiar apps such as Microsoft Office, Dropbox, or Google Drive.

Considerations

Data privacy is the area where new considerations arise when accessing AI tools. There are new things that you need to be thinking about when you use AI to be sure you don’t inadvertently expose sensitive data.

The first thing you need to do is to be sure you understand what confidential information you have access to. This data may exist in highly controlled systems (e.g., a donor database), loosely controlled intermediate storage (e.g., spreadsheet created from data in a database), or noncontrolled applications (e.g., emails). Examples of confidential information can vary from financial transactions to an individual’s name.

If you do have access to confidential information, then before you worry about possible risks related to AI, it’s a good time to remind yourself what your responsibilities are for handling the data. Simple things like using a password manager and taking care not to print private data can go a long way in preventing data leakages.1

Technical Risks

With AI tools, there are technical risks associated with data security—the possibility that data you share with the AI tool is stored on the vendor’s servers and are exposed through a breach. As previously mentioned, these risks are generally as low with most AI tools as are the security risks associated with other major software applications.

However, there’s a new data privacy risk that comes with using AI tools, which is the possibility that confidential information you upload to the AI tool might be exposed to other users of that tool through mechanisms related to how artificial intelligence works. The primary new risk is that your data may be used in future training of the AI model.2

Policies about use of data you submit to AI tools in their future training vary substantially among AI tools. In some cases, you can choose to prevent the AI tool from using your data in training (”opt out”), but by default the data may be used. In other cases, the default is for your data not to be used, but you can choose to allow it (”opt in”). Other tools don’t publish their approach or rely on underlying third-party AI systems whose policies aren’t openly shared with users.

The Bottom Line: Practical Implications

So what does all this mean to you? The simple guidance from many AI authors at the moment could summarized as:

If you wouldn’t put it on your homepage or post it on social media, don’t hand it over to AI.

In terms of how you could share your sensitive data with AI, actions could include:

Pasting data. For example, you might copy a few paragraphs of an internal policy and paste into a chatbot to ask for a summary or insights into how the policy applies to you
Uploading a file. Here, the example might be uploading a spreadsheet downloaded from a donor database to a chatbot asking for help analyzing trends.
Rewriting a draft. You might write a draft of an internal report that includes sensitive information and use an “ask AI” feature to improve your writing.

Any time you’re providing information to an AI tool through any means your device (computer, tablet, phone) allows, you’re in a risk area.

However, I’ve titled this section “Practical” Implications. And this is where I’m immensely frustrated at the large gap between standard advice from AI thought leaders and practical examples like the nosy teenager.

To summarize my view on this, I’m going to quote a recent issue of The Process, a newsletter published by Philip Deng, CEO of Grantable:

If I use AI in grant-seeking, is my data being shared with other people? Versions of this question come up all the time and the answer is — it is extremely unlikely your data will be shared without your knowledge. When it comes to standard security concerns, reputable AI systems are no less secure than common workplace software we’re all using. For instance, ChatGPT is largely hosted on Microsoft servers like the ones hosting Word, Excel, and Outlook, which are all highly secure and do not co-mingle user data.

As an aside on how large language models work, even if your grant proposals ended up in a training data set, generative AI systems do not produce outputs by citing or referencing information from their training data, instead they make mathematical predictions about each next word that should follow your prompt based on the patterns they have observed across all the text in the training data. The more specific and unique a piece of writing is, the less likely an AI model is to recreate it for someone else.

So, what does information security mean in the age of AI? It means largely what it did before ChatGPT took the world by storm. We should use highly secure passkey systems on all of our accounts across the internet, we should know the companies who make our technology, and understand their data use policies.3

With this view of the practical implications of data privacy considerations and AI, it could be that sharing sensitive information with an AI tool that you normally wouldn’t post on social media might be an acceptable risk—i.e., the possibility that the data might be served up to someone else down the road is miniscule compared to the value of using the AI tool now to advance your mission. Or it may be that what you have access to is so sensitive that any non-zero risk is intolerable.

So my advice is:

Follow the letter of the law in your organization’s AI Acceptable Use Policy when it comes to sharing confidential information with an AI tool.

What?!? Your organization doesn’t have an AI Acceptable Use Policy (AUP), or has one that doesn’t tell you what you can and can’t do with the confidential data you have access to? Then my guidance is:

Get your organization’s ED/CEO to sponsor creation of a solid AUP and be part of its creation and adoption.

One of my first StrefaTECH articles, Take the Wheel: How to Steer Your Nonprofit's AI Strategy, discusses approaches for both creating and adopting an AUP. Also, I recommend Joshua Peskay’s article, Is it time for your organization to have an AI Acceptable Use Policy? There also are many other articles and templates on this topic, including these:

Because there’s still much to be learned about how AI works including the possibilities of data exposure,4 the leadership of your organization needs to be educated (perhaps by you!) about the risks, both known and unknown. They are in a position to provide guidance about sharing various types of confidential data with AI tools, weighing the risks and costs of exposure against the benefits of using AI.

And for those in leadership positions who are trying to figure out whether, when, and how to create such a policy, the answer is YES, NOW, and find help! Many firms are beginning to offer services related to getting going with AI safely, including a few focused on nonprofit organizations. Please reach out to me if you’re interested in exploring further!

Conclusion

First and foremost, thinking about AI data privacy should be a reminder to check yourself for how responsible you’re being in general when accessing and using confidential data. It’s easy to become complacent about what you do with sensitive information—don’t!

Then, before you use AI to do anything with confidential data—whether pasting, uploading, or just highlighting for AI help—stop! And refer to your organization’s AI Acceptable Use Policy to be sure you’re adhering to the guidelines that your leadership requires of you. Then, if it’s OK, continue with caution.

And please remember, the responsibility is solely yours to be sure you are a responsible steward of the data you’re privileged to access in your organization … just as it’s purely on you to ensure that anything you share on behalf of the organization is true (see Hallucinations Part 2, concluding soapbox!).

Indeed, the AI we’re using now is the worst AI we’ll ever see, and the riskiest. There’s much promise, much complexity, and much to learn. Start safely and proceed with caution, but do start and do use these tools!

I absolutely do respect and believe in the wisdom of solid cybersecurity systems, including the typical technology investments (encryption, virus protection, password management, etc.). However, the “nightmare scenario” of data privacy violation that is possibly much more plausible to happen is what I refer to as the nosy teenager saga. Imagine you’re working from home and print case notes related to the pending eviction of one of your organization’s clients. You leave the printout on the table, go out to stir dinner, and come back to find your teen holding the printout and asking with great concern why his teacher is facing eviction and what he and the classmates he just texted about it can do to help. Ouch!!! So, when it comes to data privacy/security, please remember the simple and most crucial element: your care and caution!

The term AI training refers to the ‘magic’ that occurs with generative artificial intelligence when it analyzes large bodies of data to create a model that’s used to respond to requests. Breakthroughs in the research into training techniques were behind the explosion of generative AI this year. Basically, the training involves using a large number of computers to analyze vast sets of data so that the underlying AI model can improve its response when posed with a question. AI training data sets largely have come from the internet in the past, but they now also are often incorporating the data supplied in their use—such as your chats with a chatbot or your selected/pasted text when using AI to help you improve your writing.

I highly encourage reading his complete article, Grant Pros, Our AI Ethical Concerns Are Overblown: AI is Not the Adversary We Think, published Nov. 8, 2023 at https://philipdeng.substack.com/p/grant-pros-our-ai-ethical-concerns

It is unfortunate but true that even leading AI vendors and researchers are making new discoveries about what can be done with and to AI tools. Read this article, published just last week, about research that figured out how to get ChatGPT to spew out exact information that had been in its training data. The leak has been plugged, and the exposure wasn’t one where a malicious user could dig for specific information, but it does highlight that this incredible generative AI technology is still mysterious even to the most expert of experts.