Ramp: Why We Built Our Own Background Agent
A tutorial by Zach Bruggeman et al.. Featured in the OTF curated resource library.
Why Build, Not Buy?
Ramp evaluated every available AI coding tool — Claude Code, Cursor, Copilot, Devin — and used them extensively. The engineering team was already productive with these tools for interactive coding. But they saw an opportunity: unattended, overnight agent work on their specific codebase.
The off-the-shelf tools excelled at interactive use but lacked the integrations Ramp needed for autonomous operation: deep integration with their CI/CD pipeline, their specific testing framework, their code review standards, and their security policies.
The decision: build a custom background agent that runs Claude's API with Ramp-specific tooling. Interactive coding stays on Cursor and Claude Code. Unattended tasks go to their custom agent.
The Agent Architecture
Ramp's background agent follows a specific architectural pattern.
Task queue from GitHub Issues
Engineers tag GitHub issues with 'agent-ready' when they have clear acceptance criteria. The agent picks up tagged issues in priority order.
Sandboxed execution environment
Each task runs in an isolated container with a fresh checkout of the codebase. The agent can read files, write files, run tests, and execute shell commands — all within the sandbox.
Claude API with custom tools
The agent uses Claude's API with custom tool definitions specific to Ramp: database migration tools, test runners, linting, and security scanning. These tools enforce Ramp's engineering standards.
PR creation and automated checks
Completed work is submitted as a pull request with a detailed description. Ramp's CI pipeline runs automatically: type checking, tests, security scanning, and performance benchmarks.
What the Agent Does Daily
Bug Fixes (~40% of tasks)
Well-documented bugs with clear reproduction steps are the highest-success-rate task type. The agent reads the issue, traces the code, implements a fix, and writes tests. Success rate: ~75%.
Test Generation (~25%)
Given a list of untested files, the agent generates comprehensive test suites. It reads the implementation, identifies edge cases, and writes tests using Ramp's testing patterns. Success rate: ~85%.
Documentation Updates (~20%)
When API changes are merged, the agent updates affected documentation to match. It reads the code changes, identifies discrepancies, and generates updated docs. Success rate: ~90%.
Dependency Updates (~15%)
The agent updates dependencies, runs the test suite, and creates PRs for successful updates. Failed updates are flagged with the specific error. Success rate: ~60%.
Lessons from Production
After 6 months of production use, Ramp's key insights:
Task quality in = code quality out. Vague issues produce vague PRs. Issues with clear acceptance criteria, specific file references, and example inputs/outputs produce dramatically better agent output. The team now invests more time writing clear issues.
Start narrow, expand slowly. They launched with documentation tasks only (lowest risk). After two weeks of consistent quality, they added test generation. After another month, bug fixes. This gradual trust-building prevented costly mistakes.
The agent handles 15-20% of their PR volume. Not a majority, but a significant chunk — freeing engineers to focus on complex features and architecture work that agents can't handle.
Morning review rituals matter. The first 30 minutes of each engineer's day includes reviewing agent-generated PRs. This is the quality gate. When reviews are delayed, merges pile up and quality issues compound.
Some task types aren't worth it. The agent struggled with UI work (too subjective), performance optimization (requires profiling), and cross-team coordination tasks. These stay human.