Dory Zidon / Sandboxes Are the New Dev Environment

Share this post

X / Twitter

Facebook

Last week our CEO shipped a PR from his phone. So did our PM. So did our designer.

That's not the headline. The headline is that they're not doing it once. They're active members of our dev team now, building features, building flows, shipping real reviewed code every week, alongside the engineers.

This is how we write code in 2026.

What changed isn't the AI. It's where the work happens. We pulled it off the laptop and into a sandbox, an ephemeral cloud environment anyone in the company can spawn with a Slack message.

I built this at Reddy with Jake, our CTO. I now maintain it. In the next few minutes I'll show you what we built, why it works, and what to do Monday morning if you want your org to ship the same way.

The video below is the canonical version. Watch the demo run live. The blog goes deeper into why sandboxes are the new dev environment, and why you should build your own version of this now, even though the infra underneath will keep changing.

▶ Watch the demo: Sandboxes: when CEOs Ship code (5:14)

One message, many sandboxes — a single beam splits through a prism into a grid of glowing, self-contained workspaces.

The Shift: Where the Work Lives Now

For 30 years, the dev environment was a laptop. You cloned a repo, ran a setup script, installed dependencies, fought your local Postgres, and that machine was where the work lived. If your CEO wanted to ship code, you had to give him a laptop, a setup guide, and three days of patience.

That model is over.

A sandbox is what replaces it. Strip it to first principles: a sandbox is an ephemeral cloud environment with all your tooling, your repo, your secrets, and your agents baked in, and it's addressable by a message. You send a Slack message describing the task. A bot spawns the environment, opens a channel for the work, runs Claude Code inside, and you watch the agent build the feature in real time. When it's done, a PR opens. When the PR merges, the sandbox dies.

Three properties make this work, and all three matter:

Ephemeral. Each task gets a fresh environment. No "works on my machine," no half-stale dependencies, no leaked state from yesterday's experiment.
Shareable. The sandbox runs in a Slack channel. Anyone on the team can watch it work, jump in to debug, or hand it off. The environment is a participant in the conversation, not a thing locked behind one person's screen lock.
Programmable. You can spawn one with a message. That's why our CEO can ship: the friction of getting to a working dev environment has dropped to zero for everyone.

The IDE didn't die. The laptop didn't die either. But the place where the actual work happens, the agents, the tests, the builds, the diffs, moved into the cloud, and into a shape the whole org can address.

The Demo: What Shipping a Feature Looks Like Now

Let me walk you through one. Pretend you're our PM. You want to add a "send feedback" button to the dashboard.

You DM the bot: "Spin up a new machine, I want to add a feedback button to the customer dashboard."

In about ten seconds, three things happen. The bot spawns a fresh VM in the cloud, with our repo cloned, our secrets loaded, and Claude Code already booted up inside. It creates a new Slack channel for the task. And it posts a single message in that channel: the agent is working.

You click into the channel. You see Claude Code thinking, reading files, writing diffs, running tests. It's all live, in Slack, formatted as messages. You can watch, you can interrupt with a follow-up message ("actually make the button red"), or you can close the channel and come back in twenty minutes.

When the agent finishes, it opens a PR. The PR runs through our automated gates: tests, lint, a few LLM-judged review passes. If something fails, the channel pings you with the failure and the agent tries to fix it.

This is also where engineering shows up. We're watching too. Our engineers review the PRs that come out of these sandboxes the same way we review each other's work, and when something looks off we jump into the channel and steer the agent, or push a fix ourselves. The PM didn't replace the engineer. The engineer just got leverage: instead of being the bottleneck for "can you build this feature," they're the reviewer and the safety net for ten of them running in parallel.

When the PR passes and a reviewer approves, you click merge from your phone. The sandbox dies. The channel stays as a record.

That's the whole loop. Slack message in, PR out, no laptop, no setup, no "is the dev environment ready yet."

The video shows this running live in about ninety seconds. The thing to notice is that the PM in this story never opened a terminal, never cloned a repo, never asked an engineer "can you set me up." The dev environment came to them, did the work in plain sight, and went away.

This is why our CEO ships code. Not because he learned to code. Because the friction of getting to a working dev environment fell to zero, and getting to the agent fell to one Slack message.

Under the Hood: The Part the Video Skips

The demo is the magic trick. Under the hood is the engineering choices that make the magic trick repeatable. Here's the shape that's worked for us at Reddy.

One task, one sandbox, no shared state. Every feature request gets its own VM. We don't reuse environments across tasks, we don't share state between agents, we don't let two sandboxes touch the same branch. This isolation is what makes the system safe enough to let non-engineers drive: an agent can do something weird inside its sandbox and the blast radius stops at the sandbox boundary.

Every sandbox is observable from one place. Each one is labeled with the task, the requester, and the originating Slack channel. We have a centralized dashboard that shows every running sandbox across the org: who spawned it, what it's working on, what state it's in, how long it's been running. From there we can jump into any sandbox, see its logs, kill it, hibernate it, or fork it. Sandboxes are first-class objects in our infrastructure, not anonymous compute.

A centralized control plane over disposable sandboxes — every running sandbox shown with task, requester, state, and runtime, with controls to jump in, kill, hibernate, or fork.

Parallelism is the default, not the upgrade. A single engineer at Reddy regularly has ten, fifteen, twenty sandboxes running concurrently. One is fixing a copy bug. Three are exploring different approaches to the same refactor. Two are background-rebuilding test fixtures. Five are running long-form research tasks. You stop thinking "what should I work on next" and start thinking "what should I queue next." The bottleneck shifts from your hands to your taste.

Hibernation, not just spawn-and-kill. Sandboxes are cheap to spin up, but some work is worth keeping warm. A long-running research task, or a sandbox you've half-debugged and want to revisit tomorrow, can be hibernated and brought back later with all its state intact. The default is ephemeral, but you have the lever when you need it.

Review is automated by default, human-escalated by exception. Every PR runs the same gauntlet: unit tests, lint, type checks, and a series of LLM-judged review passes that catch obvious quality and security issues. Most PRs from sandboxes go through cleanly. The ones that don't get routed back to either the agent (for a fix-up loop) or to a human reviewer in the channel. We're tuning this constantly: when an LLM judge misses something a human catches, that becomes a new rule.

The quality gauntlet: every PR rides a conveyor through automated inspection stations — tests, lint, review — before it ships.

Failure is its own workflow. When something blows up inside a sandbox, we don't try to debug the broken sandbox itself. We spin up a debug sandbox: a separate VM with a debug agent that has access to the failing sandbox's logs, state snapshot, and history. The debug agent investigates, reports back to the channel, and often proposes a fix. Sometimes we kill the original and respawn from a clean state. Sometimes we let the debug agent push a patch and resume. Either way, debugging is a parallel agent task, not a human-driven postmortem.

The thread running through all of this: centralized tooling, isolated execution. The dashboard, the debug agents, the hibernation layer, the review gates: all of it lives in one place and operates on sandboxes as objects. The sandboxes themselves stay disposable. That's the design, and it's the part of the architecture I want you to take with you, whoever ends up providing the VMs and the models underneath.

Why This Is the New Dev Environment (Not a Fad)

AI and agents have changed how we work fundamentally. Since the end of 2025, no one writes code in an IDE anymore. This is the biggest shift programming has ever had, and it's going to cascade and redefine how engineering gets done everywhere.

Sandboxes are what that shift looks like in practice. Three forces are driving it, and none of them are going away.

First: the agent writes the code now, and that changes everything about what a dev environment is for. A developer in 2026 isn't sitting there reading each line as it's typed. The agent writes the code. We write the workflow. Our job shifted from "produce the diff" to "design the system the agents work inside": the boundaries, the review gates, the failure-handling, the standards every PR has to pass. That's a fundamentally different job, and it needs a fundamentally different environment. The IDE optimized for one human reading and writing code by hand. The sandbox optimizes for orchestrating agents, watching them as a team, and codifying how your org wants software built. It's also what lets everyone in the company become a dev: when the workflow is uniform and the review gates are sharp, the question stops being "do you know how to code" and starts being "do you know what you want built." That's a much bigger pool of people.

Second: ephemeral environments are cheaper than persistent ones at agent scale. When a single developer is running fifteen agents in parallel, you cannot afford to keep fifteen warm dev environments running per developer. The economics only work if environments spin up on demand, do the work, and disappear. The persistent dev box, the one we've all kept alive on our laptops for years, is fine for one human. It's a non-starter for fifteen agents. Ephemerality stops being a nice property and starts being a requirement.

Third: collaboration is now native. This is the one most people miss. The IDE was a single-player tool. You opened it, you worked, you eventually pushed and someone reviewed your diff after the fact. Sandboxes are multi-player by construction. The work lives in a channel. Anyone can watch it in real time, comment, jump in, hand off, search later. The PM can see what the engineer is doing. The CEO can see what the PM is doing. The new hire can scroll back through old channels and see how decisions actually got made. The dev environment isn't just where the code is written anymore. It's where the team thinks together about the code.

You don't need to predict the future to see this. You can already feel which projects move at the speed of "everyone sees everything" and which ones move at the speed of "someone is heads-down on their laptop." Sandboxes are betting on the first speed.

Build It Now, Even As The Infra Changes

The VMs, the orchestration, the agent models, the sandbox primitives, all of it will keep changing. We've swapped pieces of our stack three times in the last year. So why build this now?

Because the infra isn't the thing. The flow is your new DNA. It's what makes your org go faster or slower. Invest in it today, because it survives any infra change underneath.

The flow is your DNA — a helix of workflow elements (chat message, sandbox, review gate, CLAUDE.md, merged PR) above swappable, fading infrastructure.

The flow encodes your tooling, your standards, your review gates, your failure-handling. It also makes how people work reviewable. You can see how every PR got built, where agents got stuck, what kinds of tasks fail. That visibility turns into group learnings: failure modes you catch once and automate away forever. The flow gets sharper the more it runs. You release more, you do more, and your whole team levels up at the same time because everyone sees the same patterns.

This is the unlock, and it's going to be the key differentiator. Humans were the bottleneck. Now anyone with a customer touchpoint can become a programmer: the PM who feels the support tickets, the designer who knows where the UI breaks, the CEO who hears it from the customers directly. They all ship now. And their work is visible to engineering, so we learn from it too.

I cannot stress how important that is. Org-wide learning, encoded in a flow that compounds. That's the moat. The infra you choose today will be different in a year. Your DNA stays yours.

Monday Morning: What To Do

Forget the perfect setup. Use the prompt I'll publish on doryzidon.com to build your own.

Pick your pieces. Daytona for sandboxes. Cloudflare sandboxes. Whatever sandbox primitive you like. Build an image with your tooling baked in: your repo, your test runner, your linter, your secrets. Put a chat interface on top. Slack. Discord. Teams. Whatever your team already lives in.

Start small. Just get one agent running in the cloud. Claude Code. Codex. It doesn't matter which. Send it a task, watch it work, get a PR out. The model and the sandbox vendor will both change in six months anyway. The point is to feel the loop, not pick the winner.

You'll learn a lot from that first run. What breaks, what surprises you, what your team actually wants to see in the channel. Take those lessons and build the next version.

Then bring in five engineers and a PM. See how they use it. Watch where they get stuck. Watch what they ship that surprises you. This is where you learn what your flow actually needs to look like, because every org's flow is different.

Add a simple workflow. A code-change task type. Review gates that match your standards. A CLAUDE.md or AGENTS.md that codifies how your org wants software written. Use the pieces I've already covered on the channel: this is all the building blocks coming together.

That's it. Don't overthink it. The sandbox is the primitive. The agent is the worker. The flow is what you build on top, and you build it by running it.

Takeaways

Sandboxes are the new dev environment. Not the IDE. Not the laptop. The ephemeral, shareable, programmable cloud sandbox.
Everyone ships. Designers, PMs, CEOs. Anyone with a customer touchpoint becomes a programmer when the workflow is good enough.
10 to 20 parallel agents per developer. The bottleneck shifts from your hands to your taste.
Collaboration is native. Every task is a channel. Every channel is reviewable. Every PR is a group learning.
Your sandboxing flow is your DNA. Start building it now. The infra will change. The flow stays yours.

If you want to see the whole thing run live, the video is the canonical version: Sandboxes: when CEOs Ship code (5:14).

If you want to compare notes on what you're building, hit me up. I'd genuinely love to swap implementations with a few more engineering teams that are running sandbox-native workflows. The space is moving fast and we all get sharper by trading what's working and what isn't.

I'll publish the build-your-own prompt on doryzidon.com shortly.

Stop wasting time on AI. I run practical experiments — real lessons you can use tomorrow, biweekly.

Subscribe to AI Will Replace Your Engineers

Sandboxes Are the New Dev Environment

Share this post

The Shift: Where the Work Lives Now

The Demo: What Shipping a Feature Looks Like Now

Under the Hood: The Part the Video Skips

Why This Is the New Dev Environment (Not a Fad)

Build It Now, Even As The Infra Changes

Monday Morning: What To Do

Takeaways

More posts

The love story that brought me to Sofia

Reviewing AI PRs at Scale: 5 Levers to Stop Drowning in Pull Requests

Get the latest updates