All posts

The 'One-Click' Deployment Myth: Why We Built Our Own Orchestrator

Indraneel Sinare
Indraneel Sinare
March 23, 2026March 24, 20263 min read
Hero image for The 'One-Click' Deployment Myth: Why We Built Our Own Orchestrator

Managed deployment platforms are great until you hit 50 concurrent microservices. Suddenly, 'one-click' becomes a 'ten-minute-wait' and your monthly bills start to look like the GDP of a small island nation. We realized that the "seamless" abstractions we were paying for were actually the very things slowing us down.


In this post


Why 'Seamless' is a Performance Tax

The problem with most managed platforms isn't that they're bad-it's that they're general-purpose. They have to handle everything from a static site to a complex monolith. When you're running a highly specific distributed system, you're paying for abstraction layers you don't need.

For us, this manifested as a "cold start" problem for our entire CI/CD pipeline. Every time we pushed code, the platform would:

  1. Provision a fresh build environment.
  2. Re-download 1.2GB of node_modules.
  3. Run container builds without shared cache.
  4. Deploy with a generic load-balancing strategy that caused 503s during rollout.

We were seeing p99 deployment times of 12 minutes. For a team that pushes code 40 times a day, that's nearly 8 hours of collective engineering time wasted waiting for green circles.

Building the Custom Orchestrator

We decided to move our core deployment logic to a custom orchestration layer running on our own Kubernetes clusters. We didn't build a new CI system-we just built a better way to coordinate the pieces.

The Warm-Cache Strategy

By maintaining persistent build nodes with shared SSD caches, we eliminated the Step 1 and 2 overhead entirely. Instead of downloading the world, we only fetch what changed.

// Our orchestration hook for cache hydration
async function hydrateCache(nodeId, workspaceId) {
  const cacheExists = await checkS3Cache(workspaceId);
  
  if (cacheExists) {
    // Parallelize extraction to maximize IOPS
    return await parallelExtract(nodeId, workspaceId);
  }
  
  return await fullRebuild(workspaceId);
}

The Scatter-Gather Rollout

Instead of a generic rolling update, we implemented a "scatter-gather" strategy. We deploy to 5% of nodes, monitor error rates for 60 seconds, and then surge the remaining 95% in parallel across zones.

The Results: 80% Faster Deploys

The numbers spoke for themselves. After migrating the first 10 microservices, we saw:

  • p99 Build Time: Dropped from 12m to 2.4m.
  • Deployment Success Rate: Increased from 94% to 99.8% (due to better health checks).
  • Monthly Infrastructure Cost: Reduced by 45% because we stopped paying for ephemeral build compute markups.

The Cardinality Wall: Where This Breaks Down

This wasn't a free lunch. We now have to maintain our own "orchestration of the orchestrator." If our Kubernetes control plane acts up, we have to fix it, not a support ticket.

If you have fewer than 10 microservices or your team is under 5 engineers, don't do this. The maintenance overhead will outweigh the speed gains. But once you're fighting the platform more than the code, it's time to peek under the hood.

Getting Started

If you're feeling the pinch of managed platform limits, start by auditing your build times. Look for the "dead time" between a git push and the first line of your build script. That's your opportunity.

You can check out our orchestration-samples repo for some of the primitives we used.