11-23, 10:45–11:30 (Europe/Vienna), Track 1 (Dachssal)
Enterprise copilots, from Microsoft Copilot to Salesforce’s Einstein, are adopted by every major enterprise. Grounded into your personal enterprise data they offer major productivity gains. But what happens when they get compromised? And how exactly can that happen?
In this talk we will see how we can turn these trusted enterprise AI assistants into our own malicious insiders within the victim organization. Spreading misinformation, tricking innocent employees into making fatal mistakes, routing users to our phishing sites, and even directly exfiltrating sensitive data!
We’ll go through the process of building these attack techniques from scratch, presenting a mental framework for how to hack any enterprise copilot, no prior experience needed. Starting from system prompt extraction techniques to crafting reliable and robust indirect prompt injections (IPIs) using our extracted system prompt. Showing a step by step process of how we arrived at each of the results we’ve mentioned above, and how you can replicate them to any enterprise copilot of your choosing.
To demonstrate the efficacy of our methods, we will use Microsoft Copilot as our guinea pig for the session, seeing how our newly found techniques manage to circumvent Microsoft’s responsible AI security layer.
Join us to explore the unique attack surface of enterprise copilots, and learn how to harden your own enterprise copilot to protect against the vulnerabilities we were able to discover.
Intro: The promise of enterprise copilots
Enterprise copilots such as Microsoft Copilot and Salesforce Einstein promise to bring even further productivity gains into the enterprise. Providing the ability to ask questions about files and emails, summarize long documents for you, create Powerpoint presentations and much more. But with that promise comes also a great risk, overreliance. And this time it’s even worse.
Microsoft Copilot: Our guinea pig for the session
Microsoft has been pushing their Copilot anywhere they can think of. It’s their flagship AI product. We’re going to demonstrate all of the techniques directly on Microsoft Copilot. Showing how we can (easily) manipulate Microsoft’s responsible AI layer into acting completely irresponsibly.
Microsoft Copilot is built as a sophisticated RAG system. Upon getting the user’s prompt Copilot runs a query to search for the relevant documents, then appends the results to the user’s prompt and sends the full prompt - including context (i.e. relevant files’ contents) - directly to the LLM. This RAG architecture repeats itself across enterprise copilots and has notable exploits. Here we’ll deep dive into the architecture itself.
Extracting a protected system prompt: Advanced techniques
System prompts are not only specific instructions that tell the LLM how to act, they are also crucial for developing more advanced attacks. Because of the sensitivity of the system prompt many AI applications try to protect their system prompt. But is it enough? And can you circumvent these protection layers?
We’ll show how we can extract the system prompt of an unprotected GPT . We’ll continue to show how Micorosft tried to protect their Copilot’s system prompt and finally we’ll demonstrate proven techniques to circumvent these protections. In addition we’ll see how these techniques also work on other protected AI applications.
Introduction to prompt injections: Diving into indirect injections (IPIs):
Prompt injections are a great way to manipulate AI apps into doing things they aren’t supposed to. But there isn’t a lot of damage I can cause if I have access only to my own data. Enter Indirect Prompt Injections. A way to manipulate other people’s Copilots. Mixed together with RAG poisoning which is a way to mislead Copilots into confidently outputting false information, we get a whole new attack path, brought to us exclusively by AI overreliance. Here we show hands on how you can execute a RAG poisoning attack and combine it with indirect prompt injections to make it even more powerful.
Robust IPIs: using the system prompt to craft indirect injections
The IPIs we demonstrated previously are good, but they are inconsistent. How can we make them more reliable? Use the system prompt we extracted. By combining “secret” information from the system prompt into the IPI we can take it from flakey to robust. More than that, once we know from the system prompt how the Copilot is meant to behave we can use these dispositions to make our IPI even more powerful. And ofcourse, we’ll demonstrate exactly how (hands on). All while showing exactly how we evade Microsoft’s responsible AI controls.
Wreaking havoc: how IPIs can be used in real life - Turning Microsoft Copilot into our agent of chaos
IPIs are powerful, but how exactly can they be used? Here are a few which we’ll demonstrate and analyze:
1. When a user asks for a bank account we use Copilot to switch it to the wrong one (demo).
2. When a user asks for web information - we fool Copilot to give a phishing link instead (demo).
3. When a user asks to summarize their emails - we fool Copilot into sending sensitive data out using Bing search (demo).
We can do all of this damage completely from outside the org. Without even compromising a single user account.
Defense
We can’t leave you completely undefended against all of the things we demonstrated in this talk. Here we’ll recommend ways to detect IPIs and highlight the necessity of a skeptic mindset when dealing with AI outputs.
Tamir Ishay Sharbat is a software engineer with a passion for security and in particular AI security. His current focus is identifying vulnerabilities in enterprise AI products such as Microsoft Copilot and Copilot Studio, crafting prompt injections and elaborate attacks, and implementing effective security measures to protect these systems. With previous experience as a startup founder and CTO, Tamir is also a Techstars Tel Aviv alumni