Sandboxed vs Filesystem Agents: Why Lovable, Bolt, and Rork Can't Follow You to Production — And What Can
A sandbox isn't a security feature — it's a deployment constraint, and it's the architectural reason your Lovable, Bolt, or Rork project stops following you the moment real production begins.
There's a moment that almost every builder who started in Lovable, Bolt, or Rork eventually reaches. The MVP came together fast — a real URL, a working screen, something you could demo by the end of an afternoon. Then you asked for the next thing. A second auth role. A webhook that has to be idempotent. An App Store build. A feature that needed to know about three files you'd already written. And the tool stalled, or regenerated something you didn't ask it to touch, or quietly broke a flow that had been working.
The instinct is to blame yourself — wrong prompt, wrong plan, not enough patience. It usually isn't you. It's architecture. The tools that get you to a live URL fastest run inside a sandbox, and the same property that makes them fast to start is the property that stops them from following you to production. This guide explains what "sandboxed" actually means for an AI coding tool, the specific walls it produces, and what a different architecture — a filesystem agent reading a pre-wired codebase — does instead.
This is the central wedge of how OTF thinks about the whole AI-building landscape, so it's worth getting the distinction exactly right.
The architecture behind the limit
"Sandboxed" is a deployment constraint, not a security feature
When people hear "sandbox," they think safety — an isolated space where nothing can hurt anything else. That framing is misleading here. For an AI app builder, the sandbox is where your project lives and runs — an ephemeral container the vendor spins up, owns, and controls. Your code exists inside that container. The agent that writes it also lives inside that container. The preview URL points at that container.
That's a real engineering convenience. It's why you don't install anything, don't configure a runtime, don't think about a build. The vendor pre-provisioned everything, and the agent operates against a known, uniform environment every single time. Uniformity is exactly what makes the first hour feel frictionless.
But uniformity has a price, and the price is isolation. The sandbox is a closed box by design. What's outside the box — your machine, your existing repo, your secret manager, your other services — is, structurally, not addressable from inside it. Not because the vendor is careless, but because the whole model depends on the box being self-contained.
Why a sandbox agent can't read your repo, follow your conventions, or use your secrets
Three concrete consequences fall directly out of that isolation:
- It can't read your local repo. The agent sees the files it generated inside its own container. It does not see
lib/payments/plans.tson your laptop, or the 40 components your team already standardized on, or the conventions encoded across a codebase that lives outside the box. Every prompt is, effectively, a fresh start with no memory of the project as you hold it. - It can't follow your conventions — it enforces its own. Folder layout, naming, state patterns, the shape of an API route: the sandbox imposes the vendor's defaults because those defaults are what make its environment uniform. You adapt to the box; the box does not adapt to you.
- It can't reach your secrets or your infrastructure. A production payment flow needs real keys, real webhook endpoints, a real database, error monitoring. Those live in your accounts. A container the vendor owns can't safely hold or reach them — and the moment you need them, you're outside the only world the agent knows.
None of this is a bug to be patched in the next release. It's the load-bearing trade-off of the sandbox model. The convenience and the ceiling are the same wall, seen from two sides.
Filesystem agents and what they actually have access to
Now the other architecture. A filesystem agent — Cursor, Claude Code, Codex — does not own a container. It runs against your filesystem, on your machine or your CI. It reads the real file tree. It sees lib/payments/plans.ts and every other file, follows the conventions written into them, and can use the secrets and tooling your environment already has, because it's standing inside your environment, not a vendor's copy of one.
That's the dividing line. Sandbox agents generate from scratch in an isolated box. Filesystem agents act against your actual project. Everything downstream — whether the agent reuses or regenerates, whether it ships or thrashes — follows from which side of that line the tool sits on.
One caveat that matters, and we'll return to it: a filesystem agent reading an empty or messy repo is only marginally better than a sandbox. The access is necessary but not sufficient. What turns access into real working power is the context the codebase gives it — and that's the part most people skip.
The walls builders hit — case by case
The architecture is abstract until you watch it produce the same four failures over and over. Here they are, with the real-world write-ups.
Lovable's production ceiling: complex auth, custom logic, the 3-month cliff
Lovable is genuinely strong at what it's for. It's the fastest credible path from idea to a live, full-stack URL, and it deserves the praise — for prototypes and early validation, it's hard to beat. The ceiling shows up when the project needs the last 20%: a non-trivial auth model, business logic that spans features the sandbox can't all hold in context, a deployment story that touches your own infrastructure. That last 20% is also the part that decides whether the thing ships or rots. We walked through exactly where that ceiling sits in Lovable's production ceiling — the structural reasons it can spin up your MVP but can't ship it to production with you.
Rork's mobile wall: from prototype to App Store submission
The mobile version of the same wall is sharper because the finish line is harder. A web prototype just needs a URL. A mobile app needs a signed build, a provisioning profile, store metadata, review submission — a chain of steps that all touch your developer accounts and your credentials. A sandbox that owns the build environment can preview an app, but it can't carry it across the App Store submission line, because that line is outside the box. We documented that specific dead end in Rork's App Store submission wall — the exact point where a Rork mobile prototype stops being shippable on its own.
The "just export it" trap — what the code looks like after export
When the wall appears, the standard advice is "just export the code and finish it elsewhere." It sounds free. It isn't. Exported sandbox output arrives without the context the sandbox held in its head — no conventions you'd recognize, no project memory, often a structure optimized for the box rather than for a human team to extend. You inherit a pile of generated code with none of the scaffolding that would let an agent (or you) keep building on it cleanly. We made the full case in why "just export it" is the most expensive advice you'll get from a sandbox tool.
UI bridge failures — where the sandbox agent drops the handoff
A subtler wall: the handoff between surfaces. A design exists, a component exists somewhere, and the agent has to bridge them — connect the generated UI to the rest of the app's reality. Sandbox tools frequently drop exactly this seam, because the "rest of the app's reality" lives in files the agent can't see. The result is UI that looks right in the preview and disconnects the moment it meets the real project. We dug into that gap in where AI coding agents drop UI handoffs — the bridge that breaks precisely at the sandbox boundary.
What sandbox output looks like in practice
Walls are the visible failures. There's a quieter symptom that shows up even when nothing obviously breaks: the output itself carries the marks of the architecture that produced it.
Why the generated code feels disposable after 90 days
Builders describe a recurring feeling — the project that felt great in week one feels like a stranger by month three. That's not nostalgia; it's a real property of code generated without project memory. Each session starts fresh, so the codebase accretes as a series of one-off generations rather than a coherent system you'd choose to maintain. We unpacked that feeling in why sandbox output feels disposable — the link between regenerate-every-session architecture and code you can't bring yourself to own.
The throwaway-vs-owned codebase decision
This sets up a binary builders eventually have to face. A codebase is either throwaway — something you'll rebuild rather than extend — or owned, a system you and your agent keep building on for years. The architecture you started in pushes hard toward one or the other. We framed that fork directly in the throwaway versus owned codebase decision, and why the second is the only one that compounds.
Dependency bloat from AI-generated projects
Sandbox output also tends to over-include. Without a clear picture of what the project already has, the agent reaches for new packages to solve problems your codebase may already solve, and the dependency tree grows heavier with each generation. That bloat becomes its own production tax — slower builds, fragile upgrades, a wider attack surface. We traced the pattern in dependency bloat from AI-generated React Native code and what an AI upgrade pass actually has to clean up.
Code that can't be reused — the structural reason
Tie all of it together and you get the throughline: sandbox tools produce code the next session regenerates instead of reuses, because the next session can't see what the last one wrote in any durable way. Non-reusable output isn't a quality problem you can prompt your way out of — it's the structural shadow of an architecture with no persistent memory of your project. We made that argument in full in code the agent regenerates instead of reuses, which is the same root cause behind nearly every wall above.
The native mobile handoff — what it actually takes
Mobile deserves its own section because the gap between "preview" and "production" is widest there, and it's where the sandbox-to-filesystem migration gets concrete.
Moving from a sandbox prototype to a real native build
The handoff from a web-shaped prototype to a true native app is not a one-click export. It's a re-platforming: native navigation, platform-specific behavior, real device APIs, a build pipeline that produces signed binaries. The work isn't optional and it isn't small, and pretending a sandbox can skip it is how teams lose weeks. We mapped that path in the move from a sandbox app to native mobile — what actually has to happen between the prototype and the store.
What a production mobile app needs that sandboxes don't provide
And what you're handing off to matters. Production mobile means components that behave correctly across iOS, Android, and web from one source of truth — not a web layout wedged onto a phone. The universal-component layer is the thing that makes the handoff survivable, because it gives the new codebase a shared vocabulary your agent can extend on every platform at once. We covered what that layer requires in what production mobile actually needs — the universal-component foundation a real app is built on.
The filesystem-agent advantage — what changes when the agent reads your repo
Here's the part that flips the whole picture. The fix for the sandbox ceiling is not a better sandbox. It's a filesystem agent — and, critically, a codebase that gives that agent something to read.
Your context files are the moat
Recall the caveat from earlier: filesystem access is necessary but not sufficient. A filesystem agent pointed at an empty repo still guesses. What turns access into real working power is the layer of context a codebase carries for the agent — a project-memory file that states conventions and constraints, a design-token manifest the agent uses instead of inventing hex values, a library of tested prompts that encode how this project wants things built. With that layer present, the agent reads it once and then reuses your patterns instead of regenerating from zero. Without it, you're back to thrash. We laid out that whole system — the files, the tokens, the prompt library — in the stay-in-IDE workflow for production apps, which is the operational companion to this architectural argument.
The pre-wired codebase pattern
So the real unit that decides outcomes isn't the agent and isn't the model — it's a pre-wired codebase: a full-stack, production-shaped project that already contains the working features and the context files an agent needs to extend them correctly. The agent reads it in one pass and follows you, because the project hands it conventions, secrets-wiring patterns, a deploy story, and a component vocabulary up front. That's the difference between an agent that ships and one that thrashes — not the tool, the context the tool is reading.
Cross-platform from day one — the same component, web and mobile, one API
The pre-wired codebase pays off most where sandboxes fail hardest: cross-platform. When a component has the same name, the same props, and the same visual behavior on web, iOS, and Android — one API the agent already understands — the native handoff stops being a re-platforming crisis and becomes a continuation. The agent extends the same vocabulary on every surface instead of learning a new one per platform. That universal layer is its own subject, and we go deep on it in one component, web and mobile, that your agent can read and ship.
Decision tree — which tool, which stage, which project
The honest framing isn't "sandbox bad, filesystem good." It's: match the architecture to the stage.
| Stage / need | Reach for | Why |
|---|---|---|
| Validate an idea in an afternoon | Sandbox tool (Lovable / Bolt / Rork) | Fastest path to a live URL; isolation is a feature here |
| Demo to get feedback, no real users yet | Sandbox tool | Still inside the box, still fine |
| Real auth, payments, custom business logic | Filesystem agent + pre-wired codebase | Needs your secrets, conventions, infrastructure |
| App Store / Play Store submission | Filesystem agent + native build pipeline | The finish line is outside any sandbox |
| A project you'll maintain for years | Filesystem agent + owned codebase | Reuse compounds; regeneration doesn't |
| Web today, mobile next quarter | Filesystem agent + universal components | One API the agent extends across platforms |
A quick gut-check: if the thing you're about to ask for touches a file the tool can't see, a secret it can't hold, or a store it can't reach — you've found the sandbox wall, and no prompt is going to talk you past it. That's the signal to move from a box you rent to a codebase you own.
Where OTF fits
OTF kits are the pre-wired filesystem codebase this whole argument points at. Each kit is a full-stack, production-shaped project — real auth, real payments, a real deploy story, cross-platform components with one API across web, iOS, and Android — shipped with the context files a filesystem agent reads in one pass: a project-memory file, design tokens, and a library of tested prompts. You build with the agent you already use (Cursor, Claude Code, Codex), the agent reads the codebase instead of regenerating it, and it follows you all the way to production because nothing it needs lives in a box it can't open.
Start your MVP wherever it comes together fastest. When you hit the wall — and you will — own the code instead of renting the sandbox. Browse the kits and see what a codebase your agent can actually read looks like.