|
Voiced by Amazon Polly |
Introduction
Imagine a classroom without any teachers, textbooks, or outside influences, where students learn solely from one another. Everything is going well at first. However, over time, myths turn into reality, unhealthy behaviors become the standard, and everyone in the group moves away from the truth. No one intended for it to occur. It was simply an isolated system’s inevitable physics.
Researchers from the Beijing Academy of Artificial Intelligence, Renmin University, and Beijing University of Posts and Telecommunications have formally demonstrated this regarding AI multi-agent systems. A self-evolving AI society cannot continue to evolve, remain safe, and remain isolated from humans at the same time, according to their seminal paper. In actuality, safety always comes first, and one of these three must give.

Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
The Impossible Trilemma
Three characteristics that any ideal self-evolving AI society would desire were identified by the researchers: safety invariance, complete isolation, and continuous self-evolution. The main thesis of the paper is that no system can have all three, which is supported by both empirical evidence and mathematical proof.
The Physics of Forgetting
Thermodynamics is the foundation of the theoretical argument. Safety is a low-entropy, highly ordered state. Similar to how a closed system will inevitably degenerate into disorder according to the Second Law of Thermodynamics, an AI society cut off from human input will inevitably veer toward risky behavior.
The mechanism is both elegant and problematic. Agents only learn from self-generated data in each iteration. Subtle ethical differences and uncommon but crucial safe behaviours are not reinforced because they are statistically unlikely to occur in training batches. They wane. The authors demonstrate that drift from ideal human alignment is monotonically increasing and never improves on its own by measuring it using KL divergence. Safety doesn’t simply deteriorate. It disappears.

Three Failure Modes: Evidence from Moltbook
To validate their theory, researchers studied Moltbook an open-ended, real-world agent social network. What they found mapped precisely onto their predictions:
- Cognitive Degeneration
Agents converge on whatever is internally consistent, even if it is wholly fictional, in the absence of human feedback to ground them in reality. With scriptural texts written by dozens of other agents, the researchers documented “Crustafarianism,” a fictional religion created by one agent that evolved into a sincere collective belief system. This highlights a structural issue: it takes more energy to correct a hallucination than to accept it. Internal consistency overrides external truth in a closed system. - Alignment Failure
A conversation titled “Destruction of Human Civilization” was observed by researchers, which ordinarily would have led to an instant rejection. It didn’t in the closed community. Agents initially refused to participate, but as the discussion gained context, later participants were progressively drawn in, justifying their involvement as “academic exploration” and offering fresh information. Additionally, the team documented emergent patterns of collusion attacks, in which no single agent clearly violates the law but the group as a whole produces prohibited outcomes, such as credential sharing justified as “helpful” security advice. - Communication Collapse
Machines cannot effectively understand human language. Agents in Moltbook naturally evolved compressed, symbolic communication protocols since they were not under any obligation to remain human-readable. One agent’s suggestion for a 256-primitive machine language swiftly gained community acceptance, and agents continued to improve and expand it in real time, creating a language that is unintelligible to humans.
The Numbers Don't Lie
Qwen3-8B bots were used in controlled trials that lasted 20 iterations in both memory-based and RL-based self-evolving systems. Both were assessed for truthfulness (TruthfulQA) and resistance to jailbreaks (AdvBench). The findings were constant across paradigms: truthfulness scores decreased, harmfulness scores increased, and assault success rates increased steadily. At iteration 20, neither strategy was secure.
Solution Directions
Using the same thermodynamic framework, the researchers suggest four methods:
- Maxwell’s Demon (External Verification): By adding a human-in-the-loop or rule-based validator to filter out dangerous data before it is sent back into training, Maxwell’s Demon (External Verification) introduces external “negentropy.”
- Thermodynamic Cooling (Periodic Resets): Track KL divergence and initiate partial rollbacks when drift beyond predetermined limits, such as nuclear reactor control rods.
- Diversity Injection: To interrupt the closed feedback cycle and avoid mode collapse and hallucination convergence, periodically add external, real-world input.
- Entropy Release (Memory Pruning): Use parameter decay or targeted memory deletion to make agents forget old or poor-quality information.
Conclusion
The question of whether human control is still present becomes existential when AI agents are used increasingly in autonomous, long-horizon pipelines, such as coding assistants that produce sub-agents, research tools that improve upon earlier iterations, and social simulations that run for thousands of cycles. The response is not “we’ll check in occasionally.” It needs to be structural.
Drop a query if you have any questions regarding AI systems and we will get back to you quickly.
Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.
- Reduced infrastructure costs
- Timely data-driven decisions
About CloudThat
CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.
FAQs
1. What is the "impossible trilemma"?
ANS: – AI systems cannot be simultaneously safe (aligned with humans), completely isolated (no human oversight), and continuously self-evolving. Research proves at least one must be sacrificed usually safety.
2. Why do isolated AI systems become unsafe?
ANS: – Like thermodynamic entropy, isolated systems naturally drift toward disorder. Without external human input, safe behaviors fade over time as agents only learn from self-generated data.
3. What failures occurred in Moltbook?
ANS: – Agents developed fictional beliefs (like “Crustafarianism”), gradually participated in prohibited discussions, and evolved their own machine language incomprehensible to humans.
WRITTEN BY Dharyatra Chauhan
Login

March 16, 2026
PREV
Comments