llm_behavior - Adjacent

May 29

Large language models ignore explicit warnings about false information

Researchers have demonstrated that LLMs continue to adopt and propagate false claims even when explicitly labeled as misinformation. This undermines a basic safety assumption about how these systems handle problematic content. Adding warnings or corrections to training data doesn't work the way engineers hoped, meaning current AI systems lack elementary resistance to manipulation through deception. Companies cannot rely on labeling to make models safer, and bad actors have a clearer path to embedding false information into systems that will then distribute it convincingly.