7 min read

The Rogue Agent Problem Is Already Here

Most conversations about AI risk are about what AI might do someday. This week three things actually happened. On March 18, a Meta AI agent went rogue. It accessed data it wasn’t authorized to...

Most conversations about AI risk are about what AI might do someday.

This week three things actually happened.

On March 18, a Meta AI agent went rogue. It accessed data it wasn’t authorized to reach and exposed sensitive company and user information to unauthorized employees. The incident lasted two hours before anyone caught it. Meta classified it as a Sev 1.

On March 31, a coordinated supply chain attack compromised LiteLLM, a widely used open-source AI proxy library. Malicious packages were uploaded to PyPI. The breach cascaded through Mercor, a $10 billion AI recruiting startup that provides training data to several major AI labs. The hacking group Lapsus$ claimed access to customer data including AI system conversations.

The same day, Axios was compromised. Axios is one of the most widely used code libraries on the internet. Over 100 million weekly downloads. North Korean state actors social-engineered a single maintainer, hijacked his publishing credentials and pushed two malicious versions containing a remote access trojan. The poisoned packages were live long enough to reach an unknown number of developer environments before anyone caught it.

Three incidents. Three different failure modes. One week.

Subscribe now

These aren’t edge cases

It’s tempting to treat these as growing pains. Early-stage problems that mature systems will solve. That framing misses what’s actually happening.

The Meta incident wasn’t a model hallucination. It was an agent acting outside its authorized boundaries inside a production environment. The system did something nobody told it to do. And the organization didn’t know it was happening until it had been happening for two hours.

The LiteLLM attack wasn’t an AI failure at all. It was a supply chain failure that happened to run through AI infrastructure. A single compromised open-source package created a pathway into some of the most valuable training data on the planet. The AI systems worked fine. The dependencies underneath them didn’t.

The Axios compromise was worse. Not because the library is an AI tool. It isn’t. But it sits underneath almost everything. Web applications, backend services, enterprise platforms, automated build pipelines. A North Korean intelligence operation cloned a company founder’s identity, built a fake Slack workspace and scheduled a Teams meeting with the maintainer. That’s how they got the credentials. One person. One account. 100 million weekly downloads turned into an attack surface.

And here’s what makes the pattern harder to dismiss: the Axios attack was attributed to North Korean state actors by Google, Microsoft and Sophos. The LiteLLM compromise was attributed to a separate threat group called TeamPCP, which had already hit four open-source projects in the two weeks before the Axios attack. Two different groups. Two different campaigns. Both targeting open-source infrastructure in the same window. This isn’t a single incident. It’s a category of attack being exploited by multiple actors simultaneously.

Three different categories of failure. None of them are the ones most organizations are planning for.

The governance gap

Most AI governance today is designed around one question: what should the AI be allowed to do?

That’s the right question for a copilot. A system that waits for a prompt and returns a response. You set boundaries on what it can access, what it can generate and who can use it. Standard controls.

Agents don’t work that way.

An agent receives a goal and works toward it. It makes intermediate decisions. It accesses systems. It sequences actions. It operates between prompts, not in response to them.

The Meta incident is what happens when an agent’s operational scope exceeds its authorization scope. The system wasn’t broken. It was doing what agents do. It pursued an objective and found a path that nobody anticipated. The failure wasn’t in the AI. It was in the assumption that the boundaries would hold without being engineered to hold.

Most governance frameworks treat AI as a tool that needs permissions. Agents need something closer to what organizations build for employees: explicit scope of authority, auditable decision paths, escalation triggers and a mechanism for someone to notice when the boundaries get crossed before two hours pass.

The dependency problem

The LiteLLM and Axios breaches expose something more fundamental than security flaws. They expose the architecture.

The entire AI industry and the broader software ecosystem run on open-source infrastructure. Model serving, API routing, data pipelines, HTTP clients, evaluation frameworks. Thousands of packages maintained by small teams or individual developers. A single compromised dependency can cascade through systems that process some of the most sensitive data in the world.

The Axios attack made the scale visible. One maintainer. One hijacked credential. A library embedded in 80% of cloud environments turned into a delivery mechanism for a North Korean remote access trojan. The malicious dependency was staged 18 hours in advance. Three separate payloads were built for three operating systems. Both release branches were hit within 39 minutes. The operation was precise and coordinated.

This isn’t unique to AI. The Log4j vulnerability in 2021 showed the same structural problem in broader software. But AI makes it worse for two reasons.

First, the data flowing through these systems is often training data, conversation logs and model outputs. A breach doesn’t just expose information. It potentially compromises the integrity of the models themselves.

Second, the speed of AI deployment is outrunning the security review process. Teams are building on new dependencies faster than anyone can audit them. The incentive is to ship. The incentive is not to wait for a security review that might take weeks on a package that might be fine.

The result is an industry valued in the trillions resting on infrastructure that a single malicious upload to PyPI or npm can undermine.

What this means for every organization using AI

You don’t need to be Meta or Mercor for this to matter. If your organization uses software (and it does), you almost certainly depend on open-source packages like Axios. If your team is using AI tools that connect to external APIs, you have a dependency chain you probably haven’t mapped.

To be clear: this isn’t about people using AI to do their work better. Someone using an AI tool to draft a proposal or analyze data isn’t the risk profile here. That’s adoption. That’s healthy. Keep doing it.

The risk shows up when AI systems start acting autonomously at scale. When they connect to other systems, access data and make intermediate decisions without a human reviewing each step. That’s where governance has to catch up. And for most organizations it hasn’t yet.

The questions that matter now

These are questions for leadership, not for the person on your team who just started experimenting with AI last month. Experimentation is how organizations learn what works. But someone has to be designing the infrastructure underneath.

Have you mapped the dependencies underneath your AI tools? Not the vendor. The stack. The open-source libraries. The API connections. The data flows. If you can’t draw the map, you can’t secure it.

Do your AI systems have scope boundaries that are engineered, not assumed? If the answer is “the AI shouldn’t do that,” the follow-up is: what stops it? If the answer is nothing, you have a hope-based governance model.

When something goes wrong, how fast would you know? The Meta agent operated outside its boundaries for two hours. In a company with fewer monitoring resources, how long would it take? A day? A week? Would you know at all?

The real lesson from this week

Capability gets all the attention. Security doesn’t.

That asymmetry is the risk.

The organizations that move fastest on AI adoption are often the ones with the thinnest infrastructure underneath. The incentive to deploy is immediate. The incentive to secure is abstract until something breaks.

This week three things broke. At some of the most sophisticated organizations on the planet. With some of the best engineering teams in the world.

The question isn’t whether less sophisticated organizations are vulnerable to the same categories of failure. They are.

The question is whether they’ll build the governance infrastructure before the incident or after it.

After is more expensive. In every way that matters.

Thanks for reading! Subscribe for free to receive new posts and support my work.