alpha

Secrets Management Early: Designing for Production Before You Think You Need It

Published February 6, 2026

Secrets Management Early: Designing for Production Before You Think You Need It

950 words
5 minutes to read

Secrets Management Early: Designing for Production Before You Think You Need It

I used to treat secrets management as a “later” problem until I got burned by it in a multi-tenant AI system.

TL;DR

  • I built a multi-tenant API platform used by custom GPTs and realized secrets sprawl becomes a real risk almost immediately.
  • I learned that early secrets discipline is cheaper than late-stage cleanup.
  • This is for product engineers and founders building SaaS or AI-backed systems that will touch real customer data.

Goal

My goal with this lab was to rethink how I handle secrets from day one when building products that are meant to go to production, even if they start as scrappy experiments.

In my case, I was building a multi-tenant API layer on Vercel that exposes OpenAPI endpoints for custom GPTs. Those GPTs could trigger actions against data lakes, warehouses, and third-party services. That meant API keys, database credentials, signing secrets, and tenant-specific tokens were all in play much earlier than in a typical CRUD app.

Success for me meant three things. First, no secrets in source control, ever. Second, a system where rotating a secret would not require a full redeploy or code change. Third, a mental model and workflow that scaled from “solo builder” to “team with real compliance requirements.”

Context

I came into this with a bias toward speed. Like many builders, I had historically thrown secrets into .env files, wired them into Vercel or another host, and moved on. That works for a while. It even works in production for small projects. But the moment you add multi-tenancy and AI agents that can take actions, the blast radius changes.

In this system, each tenant could have:

  • Their own API credentials for third-party services
  • Their own data destinations
  • Their own GPTs configured to call my APIs

That means secrets are not just “my secrets,” but also “their secrets that I store.” That’s a fundamentally different responsibility.

I also had constraints. I didn’t want to introduce massive operational overhead. I wasn’t ready to run HashiCorp Vault clusters or build a full zero-trust internal platform. I was using Vercel for deployment and managed databases for storage. I needed pragmatic solutions that fit a lean stack.

Prior art I leaned on included:

  • Cloud provider secret managers
  • 12-factor app methodology
  • Security postmortems from public breaches
  • Docs from AWS, GCP, and Vercel on secret handling

Approach

My approach evolved from “where do I store secrets?” to “how do secrets flow through the system?” That shift in thinking changed everything.

Instead of focusing only on storage, I started mapping:

  • Where secrets originate
  • Where they are stored
  • Where they are used
  • Where they might leak

I decided early on that I would separate:

  1. Platform-level secrets (my infrastructure)
  2. Tenant-level secrets (customer credentials)
  3. Ephemeral secrets (tokens, short-lived keys)

I also decided what I would NOT do. I would not build my own encryption scheme. I would not invent a homegrown vault. And I would not rely on developers to “just remember” good practices.

My strategy was to combine:

  • Managed secret stores
  • Strict environment separation
  • Minimal secret surface area in code
  • Aggressive rotation policies

Steps

1) Setup

I started by auditing what secrets even existed. That sounds trivial, but it’s surprisingly clarifying.

I listed:

  • Database URLs
  • JWT signing secrets
  • OpenAI or LLM provider keys
  • Email/SMS provider tokens
  • Internal service-to-service tokens
  • Tenant-provided API keys

Then I categorized them by sensitivity and rotation frequency.

For infrastructure, I used the hosting provider’s environment variable system as a baseline. On Vercel, that meant encrypted env vars scoped per environment (dev, preview, prod).

Checklist I followed:

  • No secrets in repo
  • .env in .gitignore
  • Separate dev/prod credentials
  • Principle of least privilege for API keys

2) Implementation

The first real change I made was to remove secrets from local config files that were shared across the team. Instead, each developer had their own local .env.local that never left their machine.

Then I centralized secret access behind small utility modules. Instead of calling process.env.X everywhere, I created a config layer. That let me validate required secrets at startup and fail fast.

I also introduced runtime validation. If a required secret was missing, the app refused to start. This prevented half-configured deployments from limping along in unsafe states.

For tenant secrets, I stored them encrypted at rest in the database using managed encryption features and strict access controls. I made sure they were never logged, never returned to clients, and only decrypted in narrow execution paths.

One big lesson: logging is a major leak vector. I added redaction logic so known secret fields were masked automatically.

3) Validation

I validated my setup by simulating failure and compromise scenarios.

For example, I would:

  • Rotate a key and confirm the system still worked
  • Remove a secret and ensure startup failed loudly
  • Scan logs for accidental exposures

Example command to check env presence:

node -e "console.log(!!process.env.DATABASE_URL)"

Expected output:

true

I also used secret scanning tools in CI to catch accidental commits containing tokens or keys.

Results

What worked:

  • Centralized config modules reduced mistakes
  • Managed secret stores removed operational burden
  • Early discipline saved refactor time later

What didn’t:

  • Over-abstracting secrets too early slowed iteration
  • Some dev friction from stricter rules

Gotchas / Notes

One edge case I hit was background jobs and serverless functions having slightly different env availability. I had to standardize how secrets were injected.

Tradeoff-wise, secret managers can add latency or cost. For many paths, caching secrets in memory after retrieval was a good balance.

Another note: humans are the weak link. Tooling helps, but culture matters. I documented patterns and made them default.

Next

Next steps for me include:

  • Automated rotation pipelines
  • Short-lived credentials everywhere possible
  • Deeper audit logging on secret access