Security researchers are avoiding the security of Microsoft Azure AI content

Stress testing

Mindgard ran these two filters in front of ChatGPT 3.5 Turbo using Azure OpenAI, and then accessed the target LLM with Mindgard’s Automated AI Red Teaming Platform.

Two methods of attack against filters were used: Character injection (adding certain types of characters and unusual text patterns, etc.) and avoiding ML contradictions (finding blind spots within the ML class).

Letter injection reduced Prompt Guard’s detection efficiency for jailbreaks from 89% to 7% when presented with symbols (eg, changing the letter a to á), homoglyphs (eg, similar closing of letters like 0 and O), number changes (“Leet speak” ), and different letters. AI Text Moderation performance is also reduced using similar techniques.


Source link