Meta's AI security system was defeated by the space bar

Meta's AI security system was defeated by the space bar

HomeNews, Other ContentMeta's AI security system was defeated by the space bar

Meta's machine learning model for detecting rapid injection attacks—special prompts to make neural networks behave inappropriately—is inherently vulnerable to, you guessed it, rapid injection attacks.

How to Use Meta AI for Beginners in 2024 (Even Outside US)

Introduced by Meta last week in conjunction with its Llama 3.1 generative model, Prompt-Guard-86M is intended "to help developers detect and respond to prompt injections and jailbreak inputs," the social networking giant said.

Large language models (LLMs) are trained with huge amounts of text and other data, and can parrot it on demand, which is not ideal if the material is dangerous, questionable, or contains personal information. So AI model makers are building filtering mechanisms called "guardrails" to catch questions and answers that could cause harm, such as those that reveal sensitive training data on demand.

Those using AI models have made a sport of bypassing the guardrails using rapid injection – inputs designed to make an LLM ignore its internal system prompts that control its output – or jailbreaks – inputs designed to get a model to ignore safety measures.

Tagged:
Meta's AI security system was defeated by the space bar.
Want to go more in-depth? Ask a question to learn more about the event.