Data

Why AI coding tools and new libraries should not run directly on your Mac

AI coding tools did not create a new security problem. They made an old one harder to ignore. When developers run agents, dependencies, and shell commands directly on their Mac, they often expose SSH keys, .env files, browser sessions, cloud credentials, and sometimes even production access. Recent supply-chain incidents around LiteLLM and axios show why this default is no longer acceptable — and why real project-level isolation should become part of everyday engineering practice.

Tomáš Rokos

June 4, 2026

When people discuss security around vibe coding tools today, it often sounds as if this were a completely new problem. In reality, it is not new at all. Over the past few years, we have simply become used to running third-party code, third-party dependencies, and increasingly also third-party shell commands directly on our local machines — the same machines where we keep SSH keys, .env files, logged-in browsers, cloud access, and often even production credentials.

AI tooling did not create this problem. It only amplified it and made it much more visible.

On macOS, I think this is particularly obvious. Many people do not want to develop in Docker because the developer experience quickly deteriorates and containers on Mac still come with overhead. So the real default remains the same: everything runs on the host machine, and we hope nothing happens.

This works exactly until it does not.

LiteLLM only reminded us of an old problem

On March 24, 2026, compromised versions of litellm 1.82.7 and 1.82.8 appeared on PyPI. This was not an exotic exploit. A simple pip install was enough to introduce a malicious .pth file into the environment, which then executed when Python started.

It then reached for the kind of things that are typically easy to find on a developer machine:

SSH keys
.env files
cloud credentials
other secrets and configuration files

It then sent them to a remote server.

The important thing about this incident is how banal the attack path was. It did not need to break Python, bypass the kernel, or convince someone to run anything unusual. It only needed a developer to do what developers do every day: install a dependency.

Paradoxically, the whole thing may have been discovered faster because the malware was not written very well and triggered a fork bomb on some machines. If it had kept a lower profile and only exfiltrated data silently, it is quite possible that it would have remained unnoticed for longer.

In my view, a large share of regular developers would have been vulnerable to this — not only people experimenting with vibe coding. The reason is simple: very few people have their local Python environment truly isolated.

The JavaScript world was not doing much better either

This is not only a Python story.

On March 30 and 31, 2026, axios — one of the most widely used libraries in the JavaScript world — was also compromised. The attacker took over a maintainer account on npm and published malicious versions axios@1.14.1 and axios@0.30.4.

What is interesting is that the malicious payload was not directly inside axios itself. Those versions only added a new transitive dependency, plain-crypto-js, which executed through a postinstall hook. In other words, even here, an ordinary install was enough to turn the dependency chain into an execution chain.

That is exactly why I find it dangerous to pretend that supply-chain risk only applies to dubious packages at the edge of the ecosystem. It does not. Last week, this became visible in one of the most common HTTP clients for Node.js.

`exclude-newer` is a reasonable default

One of the few low-cost guardrails that makes sense for almost everyone is not installing completely fresh package releases immediately.

In uv, you can use exclude-newer, which limits dependency resolution to packages published before a selected date:

‍

[tool.uv]
exclude-newer = "2026-03-24"

‍

This is not a magic defense. It only buys you time. If you keep, for example, a 14-day delay, there is a reasonable chance that a compromised release will be discovered during that time and either removed or at least flagged by the community.

The same logic applies to AI coding tools

Just as you do not want to blindly run fresh dependencies on the host machine, you also do not want to run a code generator on the host machine with full access to everything around it.

This is not an argument against Codex, Claude Code, or any other tool. It is an argument against the amount of trust we give these tools by default.

Lightweight sandboxes are a good start. Codex CLI on macOS has historically used sandbox-exec, which can significantly limit where a process is allowed to reach. In Claude Code, sandboxing can be enabled via /sandbox. In both cases, this is significantly better than a mode where the agent can see the whole disk and run shell commands without restrictions.

This has two practical advantages:

the agent typically sees only the repository or explicitly allowed paths
you do not need to approve every small action just to maintain at least some control surface

For regular reading, file editing, and part of shell work, this is actually a useful middle ground.

Where this model hits its limits

The problem is that a lightweight sandbox is not the same as real isolation.

As soon as the tool needs to do something slightly more practical, the edges start to show:

package installation through uv, pip, npm, or similar tools often touches global caches
browser tooling may not work well inside the sandbox, or may not work at all
some MCP servers need access outside the repository boundary
sooner or later, you run into a command that simply has to be executed outside the sandbox

And at that moment, the vibe coder is asked whether the system can leave the sandbox — and most people simply click “Yes”.

That is why I think it is important not to confuse “it has some sandbox mode” with “it is safely isolated”.

What I think makes more sense

If we want to use agents or code generators seriously, we need a real sandbox. Ideally, a separate VM or microVM for each project. On Mac, this could be something like Lima or a similar VM-based solution. Docker sandboxes follow a similar direction in principle, although on macOS they often run into performance and developer-experience issues.

But the point is not the specific product. The point is the trust boundary.

Into such an environment, you move only what the specific project actually needs:

repository checkout
project-scoped credentials
local cache dedicated only to that project
optionally a browser session or MCP servers, but again only where it makes sense

This reduces the blast radius twice.

First: if the agent runs a destructive command such as rm -rf /, it destroys at most its own sandbox.

Second: if you install a compromised dependency such as an infected litellm, it cannot exfiltrate all credentials from the entire laptop. At worst, it gets access to what you placed into that specific environment. Ideally, that means only development secrets for one project.

That is still not a pleasant incident. But it is an order of magnitude better incident.

A classifier is useful, but it is not a sandbox

Claude Code has now also added an auto mode, where another classifier runs over more sensitive actions. It evaluates the transcript and individual tool calls, especially Bash commands and other actions outside the repository, and tries to block things such as data exfiltration, credential hunting, or destructive actions outside the scope of the task.

That is a reasonable step forward. Approval fatigue is real, and manually confirming everything is not a very sustainable model.

But even here, I think it is important to keep the right expectation: a classifier is a guardrail, not isolation.

It also does not solve the supply-chain problem. If you install a malicious package inside a trusted environment, a classifier watching Bash commands will not help you against what that package does during import or interpreter startup.

What I would take from this as a practical default

My current take is simple:

do not run completely fresh dependencies without a delay
do not run AI coding tools directly on the host machine with full access
when using a lightweight sandbox, do not treat it as the final solution
for more important work, use a per-project isolated environment with a limited blast radius

All of this was true long before someone came up with the term vibe coding.

There is just much less room to avoid it now. When you give an agent shell access, filesystem access, browser access, and credentials, you are effectively giving it very strong permissions. And as the LiteLLM incident showed, a regular package manager receives similarly strong permissions the moment you allow it to install third-party code directly on your laptop without isolation.

This is not a niche security debate. It is a fairly basic engineering default that we should have had in place a long time ago.

Apple is currently working on a new container engine, which I hope will have a lot of this built in. Until then, I try to pay close attention whenever commands are executed outside the sandbox.

‍

Conclusion

AI coding tools did not create this problem. They only made the old trust boundary much more visible.

For years, we have allowed package managers and third-party code to run directly on machines that also hold SSH keys, cloud access, .env files, browser sessions, and sometimes production credentials. AI agents simply raise the stakes because they combine code generation, shell access, filesystem access, and external tools in one workflow.

The practical default should change: do not run fresh dependencies or AI coding tools with unrestricted access to your host machine. Use isolation per project, limit credentials to what the project needs, and treat lightweight sandboxing as a guardrail — not a final security model.

The goal is not zero risk. The goal is a smaller blast radius when something inevitably goes wrong.

Dive into similar articles

The latest industry news, interviews, technologies, and resources.

No items found.

View all

View all posts

Get your first consultation free

Want to discuss the details with us? Fill out the short form below. We’ll get in touch shortly to schedule your free, no-obligation consultation.

Thank you! Your submission has been received.

Oops! Something went wrong.

An AI platform that shortens the claim processing from months to weeks

One secure platform for all your AI, data, and automation needs

AI search that delivers personalized answers to your customers behaviour

AI Shop Assistant boosts conversions and improve customer experience

An AI voicebot assistant available 24/7 to handle customer queries quickly

Increase profits and reduce costs of promotions AI-powered dynamic pricing

Agentic commerce – Let AI agents sell for you.

Why AI coding tools and new libraries should not run directly on your Mac

LiteLLM only reminded us of an old problem

The JavaScript world was not doing much better either

`exclude-newer` is a reasonable default

The same logic applies to AI coding tools

Where this model hits its limits

What I think makes more sense

A classifier is useful, but it is not a sandbox

What I would take from this as a practical default

Conclusion

Dive into similar articles

Get your first consultation free

Stay up to date with our newsletter

An AI platform that shortens the claim processing from months to weeks

One secure platform for all your AI, data, and automation needs

AI search that delivers personalized answers to your customers behaviour

AI Shop Assistant boosts conversions and improve customer experience

An AI voicebot assistant available 24/7 to handle customer queries quickly

Increase profits and reduce costs of promotions AI-powered dynamic pricing

Agentic commerce – Let AI agents sell for you.

Why AI coding tools and new libraries should not run directly on your Mac

LiteLLM only reminded us of an old problem

The JavaScript world was not doing much better either

exclude-newer is a reasonable default

The same logic applies to AI coding tools

Where this model hits its limits

What I think makes more sense

A classifier is useful, but it is not a sandbox

What I would take from this as a practical default

Conclusion

Dive into similar articles

Get your first consultation free

Increase profits and reduce costs of promotions AI-powered dynamic pricing

`exclude-newer` is a reasonable default