LLM securityAI safetyData poisoning
Jan 27, 2026Persistent Pre-Training Poisoning of LLMs
Adversaries can persistently compromise Large Language Models (LLMs) by injecting a small amount of malicious data (as little as 10 tokens per million) into their pre-training datasets, leading to behaviors like denial of service, private data extraction, and belief manipulation, even after subsequent alignment training.