FlipGuard: Defending Large Language Models Against Quantization-Conditioned Backdoor Attacks
9/10FlipGuard introduces a defense mechanism against quantization-conditioned backdoor attacks in large language models, addressing vulnerabilities exploited during model quantization. The system detects and mitigates malicious behaviors embedded through quantization processes, enhancing model safety when deploying compressed LLMs in production.
