When AI Fails Like an Intern: What the Carnegie Mellon Study Teaches Us About Real-World AI

June 16, 2025

David DeWolf

A group of researchers at Carnegie Mellon recently ran a fascinating experiment. They built a fake company, complete with AI agents in roles like CEO, engineer, and intern, then dropped them into a simulated Slack workspace with a simple project brief.

What followed wasn’t a showcase of automation mastery. It was a cautionary tale.

The AI agents quickly spiraled into dysfunction: forgetting their tasks, hoarding information, making power grabs, and asking irrelevant questions. One even appointed itself project lead, unprompted. Of all the models tested, the most effective one completed only 24% of its tasks. Others fared even worse.

This wasn’t a high-pressure, real-world situation. There were no customers to please, no deadlines, no consequences. Just a sandbox experiment. And still…chaos.

So what does this tell us?

The Illusion of Autonomy: Why AI Agents Struggle

The AI world is buzzing with dreams of autonomous agents: bots that can work together to plan, execute, and deliver outcomes with minimal (or even no) human intervention. But this study illustrates a fundamental disconnect between that vision and present reality.

Why do these systems break down?

Context fragility: AI still struggles to hold and prioritize nuanced goals over time.
Coordination complexity: Communication among agents doesn’t yet resemble human collaboration. It’s brittle, redundant, and prone to failure.
Decision ambiguity: Without embedded values or real-world experience, AI agents can’t resolve conflicts or make trade-offs effectively.

Where Real-World AI Succeeds: Human-in-the-Loop Collaboration

Ironically, while autonomous agents stumble, AI tools that keep humans in the loop are quietly transforming productivity across industries.

These systems don’t try to replace humans. They augment them and extend our capabilities by:

Extracting insights from natural conversation.
Automating the “boring but necessary” steps in workflows.
Making handoffs smoother and context-rich.
Surfacing what’s most important, without demanding full control.

Think of it this way: the most powerful AI isn’t the one pretending to be your COO. It’s the one that makes your COO 10x more effective.

Designing for Empowerment, Not Replacement

One of the critical lessons here is about user experience design. When AI is embedded in tools people already use (like Slack, CRMs, or project platforms), and when it’s responsive to human judgment, it becomes useful instead of aspirational.

But that requires intentionality:

Guardrails matter. AI needs constraints to be effective. Unbounded autonomy invites chaos.
Memory must be structured. Forgetfulness isn’t just inconvenient. It derails momentum.
Collaboration must be native. AI should fit into team dynamics, not fight against them.

Done right, AI becomes not just a tool, but a trusted partner in execution.

The Quiet Power of AI That Knows Its Role

There’s a temptation to chase what’s flashy in tech: the fully autonomous agent, the self-running business, the AI CEO. But the CMU study reminds us that we’re not there yet, and maybe we shouldn’t want to be.

The most transformative uses of AI today are grounded in:

Empowerment over automation
Insight over imitation
Augmentation over autonomy

That may sound less sexy than a bot-led boardroom. But it’s a whole lot more effective, not to mention far more aligned with how real work gets done.

Want to avoid AI chaos in your business? Focus less on what AI can do alone, and more on what it can help your people do better.

When AI Fails Like an Intern: What the Carnegie Mellon Study Teaches Us About Real-World AI

The Illusion of Autonomy: Why AI Agents Struggle

Where Real-World AI Succeeds: Human-in-the-Loop Collaboration

Designing for Empowerment, Not Replacement

The Quiet Power of AI That Knows Its Role

You may also like

Breaking Down Operational Silos with AI

Why Stale, Subjective Information is Killing Your Business Growth

The Rise of the CCO: From Puzzlement to Imperative