The Microsoft Azure CTO pointed out that by changing just 1% of the data set – for example, using a backdoor – an attacker can cause the model to classify objects incorrectly or generate malware. Some of these attempts at data poisoning are easily seen, as the result of adding a small amount of digital noise to an image by adding data to the end of a JPEG file, which can cause models to incorrectly classify images. He showed one example of a panda image where, when enough digital noise was added to the file, it was classified as a monkey.
Not all backdoors are bad, Russinovich took pains to point out. They can be used to fingerprint a model that can be checked by software to verify its authenticity and integrity. These can be weird questions that are added to the code and are unlikely to be asked by real users.
Perhaps the most infamous productivity AI attack concerns rapid injection techniques. These are “really tricky because one can influence more than just the current conversation with a single user,” he said.
Russinovich demonstrated how this works, with a piece of encrypted text embedded in a conversation that can lead to the leak of private data, and what he calls a “quick injection attack,” which goes back to the processes used to create web site text. exploitation. This means that users, sessions, and content all need to be separated from each other.
Top of the threat stack, according to Microsoft
The top of the threat stack and various threats related to users, according to Russinovich, include exposing sensitive data, using jailbreaking techniques to control AI models, and having third-party applications and model plug-ins forced to leak data or move around. restrictions on offensive or inappropriate content.
One of these attacks he wrote about last month, calling it Crescendo. This attack can bypass content security filters and actually open the model itself to generate malicious content through a series of carefully crafted instructions. He demonstrated how ChatGPT could be used to reveal the ingredients of the Molotov Cocktail, although his initial response was to deny this information.
Source link