Secrets Management Early: Designing for Production Before You Think You Need It

I used to treat secrets management as a “later” problem until I got burned by it in a multi-tenant AI system.

TL;DR

I built a multi-tenant API platform used by custom GPTs and realized secrets sprawl becomes a real risk almost immediately.
I learned that early secrets discipline is cheaper than late-stage cleanup.
This is for product engineers and founders building SaaS or AI-backed systems that will touch real customer data.

Goal

My goal with this lab was to rethink how I handle secrets from day one when building products that are meant to go to production, even if they start as scrappy experiments.

In my case, I was building a multi-tenant API layer on Vercel that exposes OpenAPI endpoints for custom GPTs. Those GPTs could trigger actions against data lakes, warehouses, and third-party services. That meant API keys, database credentials, signing secrets, and tenant-specific tokens were all in play much earlier than in a typical CRUD app.

Success for me meant three things. First, no secrets in source control, ever. Second, a system where rotating a secret would not require a full redeploy or code change. Third, a mental model and workflow that scaled from “solo builder” to “team with real compliance requirements.”

Context

I came into this with a bias toward speed. Like many builders, I had historically thrown secrets into .env files, wired them into Vercel or another host, and moved on. That works for a while. It even works in production for small projects. But the moment you add multi-tenancy and AI agents that can take actions, the blast radius changes.

In this system, each tenant could have:

Their own API credentials for third-party services
Their own data destinations
Their own GPTs configured to call my APIs

That means secrets are not just “my secrets,” but also “their secrets that I store.” That’s a fundamentally different responsibility.

I also had constraints. I didn’t want to introduce massive operational overhead. I wasn’t ready to run HashiCorp Vault clusters or build a full zero-trust internal platform. I was using Vercel for deployment and managed databases for storage. I needed pragmatic solutions that fit a lean stack.

Prior art I leaned on included:

Cloud provider secret managers
12-factor app methodology
Security postmortems from public breaches
Docs from AWS, GCP, and Vercel on secret handling

Approach

My approach evolved from “where do I store secrets?” to “how do secrets flow through the system?” That shift in thinking changed everything.

Instead of focusing only on storage, I started mapping:

Where secrets originate
Where they are stored
Where they are used
Where they might leak

I decided early on that I would separate:

Platform-level secrets (my infrastructure)
Tenant-level secrets (customer credentials)
Ephemeral secrets (tokens, short-lived keys)

I also decided what I would NOT do. I would not build my own encryption scheme. I would not invent a homegrown vault. And I would not rely on developers to “just remember” good practices.

My strategy was to combine:

Managed secret stores
Strict environment separation
Minimal secret surface area in code
Aggressive rotation policies

Steps

1) Setup

I started by auditing what secrets even existed. That sounds trivial, but it’s surprisingly clarifying.

I listed:

Database URLs
JWT signing secrets
OpenAI or LLM provider keys
Email/SMS provider tokens
Internal service-to-service tokens
Tenant-provided API keys

Then I categorized them by sensitivity and rotation frequency.

For infrastructure, I used the hosting provider’s environment variable system as a baseline. On Vercel, that meant encrypted env vars scoped per environment (dev, preview, prod).

Checklist I followed:

No secrets in repo
.env in .gitignore
Separate dev/prod credentials
Principle of least privilege for API keys

2) Implementation

The first real change I made was to remove secrets from local config files that were shared across the team. Instead, each developer had their own local .env.local that never left their machine.

Then I centralized secret access behind small utility modules. Instead of calling process.env.X everywhere, I created a config layer. That let me validate required secrets at startup and fail fast.

I also introduced runtime validation. If a required secret was missing, the app refused to start. This prevented half-configured deployments from limping along in unsafe states.

For tenant secrets, I stored them encrypted at rest in the database using managed encryption features and strict access controls. I made sure they were never logged, never returned to clients, and only decrypted in narrow execution paths.

One big lesson: logging is a major leak vector. I added redaction logic so known secret fields were masked automatically.

3) Validation

I validated my setup by simulating failure and compromise scenarios.

For example, I would:

Rotate a key and confirm the system still worked
Remove a secret and ensure startup failed loudly
Scan logs for accidental exposures

Example command to check env presence:

node -e "console.log(!!process.env.DATABASE_URL)"

Expected output:

true

I also used secret scanning tools in CI to catch accidental commits containing tokens or keys.

Results

What worked:

Centralized config modules reduced mistakes
Managed secret stores removed operational burden
Early discipline saved refactor time later

What didn’t:

Over-abstracting secrets too early slowed iteration
Some dev friction from stricter rules

Gotchas / Notes

One edge case I hit was background jobs and serverless functions having slightly different env availability. I had to standardize how secrets were injected.

Tradeoff-wise, secret managers can add latency or cost. For many paths, caching secrets in memory after retrieval was a good balance.

Another note: humans are the weak link. Tooling helps, but culture matters. I documented patterns and made them default.

Next steps for me include:

Automated rotation pipelines
Short-lived credentials everywhere possible
Deeper audit logging on secret access

Secrets Management Early: Designing for Production Before You Think You Need It

In this article

Actions

Secrets Management Early: Designing for Production Before You Think You Need It

Secrets Management Early: Designing for Production Before You Think You Need It

TL;DR

Goal

Context

Approach

Steps

1) Setup

2) Implementation

3) Validation

Results

Gotchas / Notes

Next