Multi-Tenant OpenAPI Action Gateways for Custom GPTs on Vercel
I wanted one codebase to power many Custom GPTs as “actionful” assistants—each safely acting within its own tenant boundary across data lakes and warehouses.
TL;DR
- I built a single OpenAPI-described “Action Gateway” on Vercel that multiple Custom GPTs can call
- I enforced multi-tenancy using signed tenant tokens, per-tenant policy, and strict query controls
- This is for product engineers and platform teams integrating Custom GPT actions with warehouses/lakes
Goal
I set out to solve a specific scaling problem: I wanted to ship multiple Custom GPTs that can take actions—query a warehouse, run jobs, create tickets, write back to a lakehouse—without building and deploying one bespoke backend per GPT.
My definition of success was: one Vercel deployment, one OpenAPI spec (or a small family of specs), many tenants, many GPT “personalities,” and strong security boundaries. A tenant should be able to use “their” GPT to read and write within approved datasets and schemas, while other tenants cannot even see those resources. Operationally, I wanted easy onboarding: issue a tenant key, set a few configs, and the GPT works.
Another success criterion was developer ergonomics. I didn’t want an architecture that required a heavyweight service mesh or complex secrets distribution. I wanted something I could deploy quickly, review in a security posture, and iterate on in small increments.
Context
The motivating pattern is simple: Custom GPTs can call external APIs defined by an OpenAPI spec. That’s powerful, but it invites a mess if each GPT gets its own endpoints, auth, and lifecycle. I’ve watched teams build “one-off” GPT tools that turn into a brittle fleet of microservices with inconsistent auth rules and unclear data access.
So I leaned into a gateway approach: a single API as the entry point for all GPT action calls, plus a tenancy layer to route, authorize, validate, and execute.
I constrained myself to an environment I can ship fast:
- Deployment: Vercel (serverless + edge)
- Spec format: OpenAPI 3.0/3.1
- Tenancy: one codebase serving many GPTs and many customers
- Data targets: warehouses/lakes (Snowflake/BigQuery/Databricks equivalents), but abstracted behind connectors
- Hard constraint: no “free-form SQL” from the model without a policy and validation layer
I also had to respect a real-world fact: GPTs are not deterministic. If I expose “run query” as an action, I must treat it as untrusted input. The platform must assume the model can produce incorrect, malformed, or malicious calls—whether accidentally or through prompt injection.
Approach
I approached the build in four layers, and I tried to be explicit about what each layer is responsible for.
First, I designed the action surface area. Instead of letting the model call “anything,” I defined a small set of actions: preview dataset, run read-only query with constraints, schedule a job, write a curated artifact. I found that reducing the action surface increases safety and makes the OpenAPI spec more stable.
Second, I created a multi-tenant gateway that sits between GPTs and the data infrastructure. The gateway handles authentication, tenant resolution, policy lookup, request validation, and auditing. It returns safe, bounded outputs.
Third, I implemented “connectors” for different backends (warehouse, lake, job runner). The gateway calls connectors with a well-defined internal contract. This lets me add or swap data systems without changing the OpenAPI surface too often.
Fourth, I hardened the system: consistent logging, per-tenant quotas, idempotency keys, deny-by-default policies, and a clear story for secrets.
I also decided what not to do. I did not try to build a universal “SQL copilot” with full write access. I didn’t let the model pass arbitrary SQL unless the tenant policy explicitly allows it and the query passes validation. I didn’t attempt to implement a full IAM system. I used a small, explicit policy model that is auditable and easy to reason about.
Steps
1) Setup
I started by creating a minimal Next.js API on Vercel that can serve an OpenAPI file and implement a couple of endpoints. The key early choice was where tenancy would live. I wanted tenancy determined by a token, not by hostname. Hostnames are convenient (subdomains per tenant), but they tend to get messy with custom domains, staging environments, and “GPT calling from OpenAI.” A token approach works anywhere and still supports optional host-based routing later.
I set up these basics:
- A single base URL for the gateway (e.g.,
https://gpt-gateway.example.com) - A stable OpenAPI JSON route (e.g.,
/openapi.json) - A versioned API prefix for endpoints (e.g.,
/v1/...) - A per-tenant configuration record that includes allowed actions and data scopes
I also defined a “tenant token” concept. This is not a user session token. It’s a service token that identifies which tenant the GPT is acting for, plus which GPT (or “agent”) identity is calling. I treat it like a capability: it grants narrowly scoped access.
The key decision was to ensure that every request the GPT makes includes:
Authorization: Bearer <tenant-token>- Optional
X-Request-Idfor traceability
Even if I include X-Tenant-Id, I do not trust it alone. The tenant token is the source of truth; headers are hints at best.
My baseline repo layout
I kept the implementation simple and predictable so I could add tenants without rewriting the app.
app/api/openapi/route.ts— returns the OpenAPI JSON (or YAML) documentapp/api/v1/...— versioned endpointslib/auth/*— token verificationlib/policy/*— tenant policy lookup + enforcementlib/connectors/*— warehouse/lake/job adapterslib/audit/*— request logging and security events
Even if you don’t use this exact structure, I found it helpful to separate “validation and policy” from “execution.”
2) Implementation
I implemented the gateway in three parts: the OpenAPI spec, the request pipeline, and the connectors.
OpenAPI: keep it small and explicit
I wrote the spec as if the model is a client that needs guardrails. That means:
- No “generic execute” endpoints without strict schemas
- Clear response types
- Strong validation constraints (enums, max lengths, patterns)
- Distinct endpoints for read vs write actions
I also gave each endpoint a purpose statement. That’s not for humans only; it helps the tool-calling model understand intent.
Here’s a representative OpenAPI excerpt (trimmed) that I used as the backbone. It’s the core “contract” between GPT and the gateway.
openapi: 3.1.0
info:
title: GPT Action Gateway
version: 1.0.0
servers:
- url: https://gpt-gateway.example.com
security:
- bearerAuth: []
paths:
/v1/tenants/me:
get:
summary: Get current tenant context
operationId: getTenantContext
responses:
"200":
description: Tenant context and allowed capabilities
content:
application/json:
schema:
$ref: "#/components/schemas/TenantContext"
/v1/catalog/datasets:
get:
summary: List allowed datasets for this tenant
operationId: listDatasets
parameters:
- name: system
in: query
required: true
schema:
type: string
enum: ["warehouse", "lakehouse"]
responses:
"200":
description: Allowed datasets
content:
application/json:
schema:
$ref: "#/components/schemas/DatasetList"
/v1/query/preview:
post:
summary: Run a bounded, read-only preview query
operationId: previewQuery
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/PreviewQueryRequest"
responses:
"200":
description: Preview results (limited rows)
content:
application/json:
schema:
$ref: "#/components/schemas/QueryResult"
components:
securitySchemes:
bearerAuth:
type: http
scheme: bearer
schemas:
TenantContext:
type: object
properties:
tenantId: { type: string }
agentId: { type: string }
capabilities:
type: array
items: { type: string }
PreviewQueryRequest:
type: object
required: [system, dataset, query]
properties:
system:
type: string
enum: ["warehouse", "lakehouse"]
dataset:
type: string
description: Dataset name, must be in tenant allowlist
maxLength: 120
query:
type: string
description: Read-only SQL subset
maxLength: 4000
maxRows:
type: integer
minimum: 1
maximum: 200
default: 50
Even in that small excerpt, the constraints matter. The model will occasionally attempt to pass large payloads or overly broad queries. Having schema limits and a server-side validation layer reduces those failures and narrows the blast radius.
The request pipeline: tenant resolution, policy, validation, execution
My server pipeline became a checklist I could audit:
- Parse and verify the tenant token
- Resolve tenant configuration (caps, data scopes, rate limits)
- Validate request schema (and additional semantic constraints)
- Enforce action-level policies
- Execute via connector
- Normalize outputs (truncate rows, remove secrets, format)
- Log and audit
- Return response
I built it as middleware-style functions even though Next.js serverless doesn’t have “classic middleware” for every route. The pattern helps me keep a consistent story across endpoints.
I’m intentionally explicit about what I validate beyond OpenAPI schema. For queries, “valid JSON schema” is not enough. I also validate semantics: read-only, table allowlist, row limits, no cross-tenant references, no DDL, no external stages. In practice I keep this strict and gradually loosen per tenant only when needed.
A practical pattern that worked for me was a “policy object” per tenant:
allowedSystems: warehouse/lakehouse/jobrunnerdatasets: allowlist patterns (exact names and/or prefix)tables: allowlist patterns within datasetsmaxRows: enforced capsmaxBytes: enforced caps (approximate)queryMode: “template-only” vs “validated-SQL” vs “disabled”
Then each endpoint checks the relevant policy fields.
Connectors: abstract the data targets
I implemented connectors behind a narrow interface. In pseudocode, it looks like:
WarehouseConnector.previewQuery(...)LakehouseConnector.writeArtifact(...)JobRunnerConnector.scheduleJob(...)
Each connector is responsible for:
- Translating requests to provider APIs/clients
- Using the correct credentials for the tenant
- Returning a normalized result
The tenant credential story is worth calling out. I avoided embedding per-tenant secrets directly in Vercel environment variables beyond small pilots. Instead, I used one of two patterns:
- A secure secrets store keyed by tenantId
- Short-lived credentials minted by an internal broker
In early stage, you can do per-tenant secrets in a DB encrypted column or a managed secrets service, but the end goal is to avoid “copy-paste env vars per tenant.”
Multi-tenant routing for many GPTs
There are two dimensions here: tenant identity and GPT identity.
- Tenant identity answers: “which customer’s data is allowed?”
- GPT identity answers: “which assistant is calling and what is it allowed to do?”
I modeled GPT identity as agentId embedded in the token. That matters because I might have one GPT that can only do read-only analytics and another GPT that can also schedule jobs. The tenant is the same, but the capability differs.
Concretely, I use a token payload like:
{
"tenantId": "ten_acme",
"agentId": "gpt_finance_ops",
"scopes": ["read:datasets", "read:query", "write:jobs"],
"exp": 1730000000
}
I verify this signature on every request and then intersect:
- token scopes
- tenant policy allowlist
- endpoint action requirements
If any check fails, I return a clear error with a minimal message. I avoid leaking policy details like full dataset lists in error messages, because those can be useful to an attacker.
Why I treat “warehouse write” differently than “lake write”
A surprising design choice I made was to allow more “write-like” operations into a lakehouse than into a warehouse. In my experience, warehouses tend to become the canonical reporting layer, and mistakes there hurt. For lake writes, I often write into a tenant-scoped “staging” or “artifacts” area, and a downstream job promotes those artifacts into curated tables.
So I exposed write operations as “create artifact” or “enqueue job,” not “insert rows into production tables.” That gave me a safer runway and a more auditable pipeline.
3) Validation
I validate at three levels: spec, runtime behavior, and security posture.
Spec validation
I run an OpenAPI validator in CI to ensure the spec is parsable and consistent. Even if I’m not using a generator, I want the contract to stay stable as I add endpoints.
Expected output is essentially “spec valid.” When it fails, it’s usually broken $ref pointers or mismatched response schemas.
Runtime validation
I test endpoints with curl. It’s blunt, but it reveals incorrect auth and policy behavior immediately.
Get tenant context:
curl -sS https://gpt-gateway.example.com/v1/tenants/me \
-H "Authorization: Bearer $TENANT_TOKEN" | jq
Expected output:
{
"tenantId": "ten_acme",
"agentId": "gpt_finance_ops",
"capabilities": ["read:datasets", "read:query"]
}
List datasets (policy-scoped):
curl -sS "https://gpt-gateway.example.com/v1/catalog/datasets?system=warehouse" \
-H "Authorization: Bearer $TENANT_TOKEN" | jq
Preview query (bounded rows):
curl -sS https://gpt-gateway.example.com/v1/query/preview \
-H "Authorization: Bearer $TENANT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"system": "warehouse",
"dataset": "analytics",
"query": "select date, revenue from daily_kpis order by date desc",
"maxRows": 20
}' | jq
The behaviors I require:
- The gateway truncates output to
maxRowsregardless of what the model asks for - The gateway rejects queries with forbidden tokens (e.g.,
insert,update,drop) - The gateway rejects datasets not in the tenant allowlist
Security validation
I keep a list of negative tests that I treat as release-blockers:
- Wrong tenant token → 401
- Expired token → 401
- Valid token but insufficient scope → 403
- Valid token and allowed scope, but disallowed dataset → 403
- Query attempting write or DDL → 400/403
- Attempt to override tenant in body/header → ignored and rejected if mismatch
I also test prompt injection resilience indirectly. I simulate the worst-case: a request tries to exfiltrate another tenant’s data. If my policy model is correct, the request should fail before any connector call happens.
Results
- What worked
- One OpenAPI contract served multiple GPTs cleanly
- Tenancy became a predictable layer I could reason about
- Bounded outputs and strict validation reduced surprising behavior from the model
- Adding a new GPT “persona” became mostly token + scope configuration
- What didn’t
- It’s easy to overexpose “convenience endpoints” early
- Query validation is always trickier than it looks
- My first pass at logging was not good enough to debug model calls quickly
- Metrics / screenshots (optional)
- The best metric was onboarding speed: after the gateway existed, spinning up a new tenant became configuration work instead of backend work
Gotchas / Notes
There are a few gotchas I now treat as architecture rules.
First, I do not allow the model to choose the tenant. Tenant identity comes from the token. Anything else invites cross-tenant leakage.
Second, I avoid any endpoint that returns large result sets. Even if the tenant is trusted, it’s too easy for the model to accidentally request something expensive or sensitive. I always enforce row limits and often enforce column allowlists or “approved views only” for higher-sensitivity tenants.
Third, error messages must be model-friendly. A generic “400 Bad Request” makes the model thrash. I return a short error code plus a short explanation, but I avoid leaking policy internals. Examples: QUERY_FORBIDDEN_TOKEN, DATASET_NOT_ALLOWED, SCOPE_REQUIRED.
Fourth, idempotency matters more than I expected. If the GPT retries (or the user triggers multiple runs), you can accidentally schedule duplicate jobs or create duplicate artifacts. I added Idempotency-Key support on write endpoints and keyed it by (tenantId, agentId, key).
Finally, secrets management becomes the real scaling bottleneck. If you plan to support dozens of tenants across different warehouses, do not store secrets as ad hoc env vars. Invest early in a secrets store or a broker that mints short-lived credentials.
Next
The next improvements I want are about operational maturity.
I want a tenant onboarding CLI that:
- creates tenant config
- issues an agent token with the right scopes
- validates connectivity to the tenant’s warehouse/lakehouse
- prints “ready” steps for configuring the Custom GPT action
I also want a safer query model. For many tenants, “validated SQL” is still too flexible. I’m leaning toward “query templates” where the model selects from pre-approved templates and fills in parameters, which turns the action into something closer to RPC than free-form execution.
Finally, I want policy and audit logs to be first-class. The gateway should provide a “what happened?” timeline: which actions were called, with what scopes, what was executed, and what data was touched.