Blog
Engineering
blog category

Engineering for Efficiency: How We Sunset Gen 1 and Built a Scalable Future

By
Alexandre Malucelli
April 11, 2025
Engineering for Efficiency: How We Sunset Gen 1 and Built a Scalable Future

Technical debt adds friction, complexity, and cost. Clearing it out isn’t glamorous, but it’s necessary. This post breaks down the steps we took to shut down Storytell’s Gen 1 infrastructure—what we removed, how we cleaned up our systems, and how we rebuilt with efficiency and scale in mind.

Performance demands were rising, and costs were following. Gen 1 couldn’t keep up. As Jensen Huang said in his 2025 GTC keynote: "Inference is token generation by a factory... it has to be done with extreme efficiency." Our infra had to reflect that. Here’s how we made it happen.

Shutting down Gen 1: A necessary move

In March 2025, we wrapped up months of work decommissioning our legacy Gen 1 system. It had done its job, but by 2025, it was holding us back—outdated and expensive to run.

By shutting down the Google Cloud infra powering Gen 1, we reduced our spend by roughly one-third. With vendor offboarding and other service removals, the overall infra savings were significant. Cost savings were a major win, but the bigger benefit was simplifying our environment and eliminating failure points.

Breaking it down: What we removed

We took a methodical approach to dismantling Gen 1. Here’s what we did:

Vendor and Service Offboarding

  • Super.so: Canceled subscription, retired Gen 1 docs site.
  • Stripe: Moved billing to a standalone Stripe link—no dependencies on legacy infra.
  • Metabase and Zilliz: Removed both to cut vendor bloat and reduce maintenance overhead.

Building Gen 2: Cleaner, faster, easier to scale

While tearing down Gen 1, we built Gen 2 to be leaner and more maintainable.

Ephemeral Environments

Now, every pull request spins up its own environment:

  • No shared staging bottlenecks.
  • Fast feedback cycles.
  • Infra cost stays low—envs are short-lived.

CI/CD pipelines (Vercel, Encore) handle this automatically. We iterate faster, test earlier, and ship with fewer surprises.

Automated Testing with Playwright

We added Playwright to catch visual regressions. Tests run in those ephemeral environments and are designed to catch regressions early in the review cycle.

Better Observability

We improved observability in Gen 2 by leaning into metrics, traces, and performance profiling instead of relying solely on logs. Using built-in features from Encore and Grafana dashboards, we now monitor:

  • API latency and throughput
  • Background job performance
  • Token generation and usage patterns
  • Error rates and retry logic over time

These metrics give us a clearer picture of how our systems behave in real-time, helping us debug faster and scale with confidence.

Efficiency as a design principle

Internally, we started calling this our "message of ecology"—a shorthand for resource awareness, cost discipline, and smarter infra decisions.

Huang’s keynote emphasized we live in an energy-limited world. Scaling isn’t just about growth—it’s about efficiency. We’re building systems to do more with less, and Gen 2 reflects that.

What’s next

Gen 1 is gone. Gen 2 is operational and scalable. We’re moving faster, with less overhead and more control.

Every decommissioned service and automated test contributes to one goal: better performance with fewer resources.

As Huang said, "Your token rate matters." At Storytell, we’re making every token—and every engineering hour—count.

Gallery

Changelogs

Here's what we rolled out this week
No items found.