Engineering for Efficiency: How We Sunset Gen 1 and Built a Scalable Future

Alexandre Malucelli

April 11, 2025

Technical debt adds friction, complexity, and cost. Clearing it out isn’t glamorous, but it’s necessary. This post breaks down the steps we took to shut down Storytell’s Gen 1 infrastructure—what we removed, how we cleaned up our systems, and how we rebuilt with efficiency and scale in mind.

Performance demands were rising, and costs were following. Gen 1 couldn’t keep up. As Jensen Huang said in his 2025 GTC keynote: "Inference is token generation by a factory... it has to be done with extreme efficiency." Our infra had to reflect that. Here’s how we made it happen.

Shutting down Gen 1: A necessary move

In March 2025, we wrapped up months of work decommissioning our legacy Gen 1 system. It had done its job, but by 2025, it was holding us back—outdated and expensive to run.

By shutting down the Google Cloud infra powering Gen 1, we reduced our spend by roughly one-third. With vendor offboarding and other service removals, the overall infra savings were significant. Cost savings were a major win, but the bigger benefit was simplifying our environment and eliminating failure points.

Breaking it down: What we removed

We took a methodical approach to dismantling Gen 1. Here’s what we did:

Vendor and Service Offboarding

Super.so: Canceled subscription, retired Gen 1 docs site.
Stripe: Moved billing to a standalone Stripe link—no dependencies on legacy infra.
Metabase and Zilliz: Removed both to cut vendor bloat and reduce maintenance overhead.

Building Gen 2: Cleaner, faster, easier to scale

While tearing down Gen 1, we built Gen 2 to be leaner and more maintainable.

Ephemeral Environments

Now, every pull request spins up its own environment:

No shared staging bottlenecks.
Fast feedback cycles.
Infra cost stays low—envs are short-lived.

CI/CD pipelines (Vercel, Encore) handle this automatically. We iterate faster, test earlier, and ship with fewer surprises.

Automated Testing with Playwright

We added Playwright to catch visual regressions. Tests run in those ephemeral environments and are designed to catch regressions early in the review cycle.

Better Observability

We improved observability in Gen 2 by leaning into metrics, traces, and performance profiling instead of relying solely on logs. Using built-in features from Encore and Grafana dashboards, we now monitor:

API latency and throughput
Background job performance
Token generation and usage patterns
Error rates and retry logic over time

These metrics give us a clearer picture of how our systems behave in real-time, helping us debug faster and scale with confidence.

Efficiency as a design principle

Internally, we started calling this our "message of ecology"—a shorthand for resource awareness, cost discipline, and smarter infra decisions.

Huang’s keynote emphasized we live in an energy-limited world. Scaling isn’t just about growth—it’s about efficiency. We’re building systems to do more with less, and Gen 2 reflects that.