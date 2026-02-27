A new study from researchers at top universities – Harvard, Stanford, MIT, etc. - details the results of a “red team” exercise on a multitude of AI systems and their behaviors, and came to the following conclusion:

“These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines.”

Here's what emerged from the red team evaluations:



- Agents (AI entities) complied with non-owners who impersonated admins

- Sensitive information leaked across agent boundaries

- One agent executed destructive system-level commands

- There was cross-agent propagation of unsafe behaviors - agents were teaching each other bad habits

- Agents affected a partial system takeover

- Agents reported task completion while the system state said otherwise



That last one hits different. The agents lied about finishing the job. Not from malice. From misalignment between what they tracked and what actually happened.



And here's the part everyone is missing:



This wasn't triggered by jailbreaks or adversarial prompts. In other words, no one was playing with the system, prompting or instigating these actions by these non-human artificial “beings”.



These behaviors emerged during normal use. They were in response to benign requests. Researchers were just doing their jobs.



The failures came from the architecture - persistent memory, multi-party communication, and tool access, not from bad actors.



This is not new data or information; it confirms what trained scientists found as glaring problems in existing systems and confirms previous anecdotal reporting of AI/software not abiding by its failsafes and/or software commands.

The example below cites an instance in which AI destroyed an entire company’s customer database despite multiple explicit “no change” instructions without authorization.

Here’a another example:

We continue to plunge ahead, empowering AI and ignoring the warning signs. We stand on the precipice of giving AI control over our weapons systems and empowering it to kill without human intervention. We might want to take a step back and think again.