Multi-Tenant OpenAPI Action Gateways for Custom GPTs on Vercel

I wanted one codebase to power many Custom GPTs as “actionful” assistants—each safely acting within its own tenant boundary across data lakes and warehouses.

TL;DR

I built a single OpenAPI-described “Action Gateway” on Vercel that multiple Custom GPTs can call
I enforced multi-tenancy using signed tenant tokens, per-tenant policy, and strict query controls
This is for product engineers and platform teams integrating Custom GPT actions with warehouses/lakes

Goal

I set out to solve a specific scaling problem: I wanted to ship multiple Custom GPTs that can take actions—query a warehouse, run jobs, create tickets, write back to a lakehouse—without building and deploying one bespoke backend per GPT.

My definition of success was: one Vercel deployment, one OpenAPI spec (or a small family of specs), many tenants, many GPT “personalities,” and strong security boundaries. A tenant should be able to use “their” GPT to read and write within approved datasets and schemas, while other tenants cannot even see those resources. Operationally, I wanted easy onboarding: issue a tenant key, set a few configs, and the GPT works.

Another success criterion was developer ergonomics. I didn’t want an architecture that required a heavyweight service mesh or complex secrets distribution. I wanted something I could deploy quickly, review in a security posture, and iterate on in small increments.

Context

The motivating pattern is simple: Custom GPTs can call external APIs defined by an OpenAPI spec. That’s powerful, but it invites a mess if each GPT gets its own endpoints, auth, and lifecycle. I’ve watched teams build “one-off” GPT tools that turn into a brittle fleet of microservices with inconsistent auth rules and unclear data access.

So I leaned into a gateway approach: a single API as the entry point for all GPT action calls, plus a tenancy layer to route, authorize, validate, and execute.

I constrained myself to an environment I can ship fast:

Deployment: Vercel (serverless + edge)
Spec format: OpenAPI 3.0/3.1
Tenancy: one codebase serving many GPTs and many customers
Data targets: warehouses/lakes (Snowflake/BigQuery/Databricks equivalents), but abstracted behind connectors
Hard constraint: no “free-form SQL” from the model without a policy and validation layer

I also had to respect a real-world fact: GPTs are not deterministic. If I expose “run query” as an action, I must treat it as untrusted input. The platform must assume the model can produce incorrect, malformed, or malicious calls—whether accidentally or through prompt injection.

Approach

I approached the build in four layers, and I tried to be explicit about what each layer is responsible for.

First, I designed the action surface area. Instead of letting the model call “anything,” I defined a small set of actions: preview dataset, run read-only query with constraints, schedule a job, write a curated artifact. I found that reducing the action surface increases safety and makes the OpenAPI spec more stable.

Second, I created a multi-tenant gateway that sits between GPTs and the data infrastructure. The gateway handles authentication, tenant resolution, policy lookup, request validation, and auditing. It returns safe, bounded outputs.

Third, I implemented “connectors” for different backends (warehouse, lake, job runner). The gateway calls connectors with a well-defined internal contract. This lets me add or swap data systems without changing the OpenAPI surface too often.

Fourth, I hardened the system: consistent logging, per-tenant quotas, idempotency keys, deny-by-default policies, and a clear story for secrets.

I also decided what not to do. I did not try to build a universal “SQL copilot” with full write access. I didn’t let the model pass arbitrary SQL unless the tenant policy explicitly allows it and the query passes validation. I didn’t attempt to implement a full IAM system. I used a small, explicit policy model that is auditable and easy to reason about.

Steps

1) Setup

I started by creating a minimal Next.js API on Vercel that can serve an OpenAPI file and implement a couple of endpoints. The key early choice was where tenancy would live. I wanted tenancy determined by a token, not by hostname. Hostnames are convenient (subdomains per tenant), but they tend to get messy with custom domains, staging environments, and “GPT calling from OpenAI.” A token approach works anywhere and still supports optional host-based routing later.

I set up these basics:

A single base URL for the gateway (e.g., https://gpt-gateway.example.com)
A stable OpenAPI JSON route (e.g., /openapi.json)
A versioned API prefix for endpoints (e.g., /v1/...)
A per-tenant configuration record that includes allowed actions and data scopes

I also defined a “tenant token” concept. This is not a user session token. It’s a service token that identifies which tenant the GPT is acting for, plus which GPT (or “agent”) identity is calling. I treat it like a capability: it grants narrowly scoped access.

The key decision was to ensure that every request the GPT makes includes:

Authorization: Bearer <tenant-token>
Optional X-Request-Id for traceability

Even if I include X-Tenant-Id, I do not trust it alone. The tenant token is the source of truth; headers are hints at best.

My baseline repo layout

I kept the implementation simple and predictable so I could add tenants without rewriting the app.

app/api/openapi/route.ts — returns the OpenAPI JSON (or YAML) document
app/api/v1/... — versioned endpoints
lib/auth/* — token verification
lib/policy/* — tenant policy lookup + enforcement
lib/connectors/* — warehouse/lake/job adapters
lib/audit/* — request logging and security events

Even if you don’t use this exact structure, I found it helpful to separate “validation and policy” from “execution.”

2) Implementation

I implemented the gateway in three parts: the OpenAPI spec, the request pipeline, and the connectors.

OpenAPI: keep it small and explicit

I wrote the spec as if the model is a client that needs guardrails. That means:

No “generic execute” endpoints without strict schemas
Clear response types
Strong validation constraints (enums, max lengths, patterns)
Distinct endpoints for read vs write actions

I also gave each endpoint a purpose statement. That’s not for humans only; it helps the tool-calling model understand intent.

Here’s a representative OpenAPI excerpt (trimmed) that I used as the backbone. It’s the core “contract” between GPT and the gateway.

openapi: 3.1.0
info:
  title: GPT Action Gateway
  version: 1.0.0
servers:
  - url: https://gpt-gateway.example.com
security:
  - bearerAuth: []
paths:
  /v1/tenants/me:
    get:
      summary: Get current tenant context
      operationId: getTenantContext
      responses:
        "200":
          description: Tenant context and allowed capabilities
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/TenantContext"
  /v1/catalog/datasets:
    get:
      summary: List allowed datasets for this tenant
      operationId: listDatasets
      parameters:
        - name: system
          in: query
          required: true
          schema:
            type: string
            enum: ["warehouse", "lakehouse"]
      responses:
        "200":
          description: Allowed datasets
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/DatasetList"
  /v1/query/preview:
    post:
      summary: Run a bounded, read-only preview query
      operationId: previewQuery
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/PreviewQueryRequest"
      responses:
        "200":
          description: Preview results (limited rows)
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/QueryResult"
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
  schemas:
    TenantContext:
      type: object
      properties:
        tenantId: { type: string }
        agentId: { type: string }
        capabilities:
          type: array
          items: { type: string }
    PreviewQueryRequest:
      type: object
      required: [system, dataset, query]
      properties:
        system:
          type: string
          enum: ["warehouse", "lakehouse"]
        dataset:
          type: string
          description: Dataset name, must be in tenant allowlist
          maxLength: 120
        query:
          type: string
          description: Read-only SQL subset
          maxLength: 4000
        maxRows:
          type: integer
          minimum: 1
          maximum: 200
          default: 50

Even in that small excerpt, the constraints matter. The model will occasionally attempt to pass large payloads or overly broad queries. Having schema limits and a server-side validation layer reduces those failures and narrows the blast radius.

The request pipeline: tenant resolution, policy, validation, execution

My server pipeline became a checklist I could audit:

Parse and verify the tenant token
Resolve tenant configuration (caps, data scopes, rate limits)
Validate request schema (and additional semantic constraints)
Enforce action-level policies
Execute via connector
Normalize outputs (truncate rows, remove secrets, format)
Log and audit
Return response

I built it as middleware-style functions even though Next.js serverless doesn’t have “classic middleware” for every route. The pattern helps me keep a consistent story across endpoints.

I’m intentionally explicit about what I validate beyond OpenAPI schema. For queries, “valid JSON schema” is not enough. I also validate semantics: read-only, table allowlist, row limits, no cross-tenant references, no DDL, no external stages. In practice I keep this strict and gradually loosen per tenant only when needed.

A practical pattern that worked for me was a “policy object” per tenant:

allowedSystems: warehouse/lakehouse/jobrunner
datasets: allowlist patterns (exact names and/or prefix)
tables: allowlist patterns within datasets
maxRows: enforced caps
maxBytes: enforced caps (approximate)
queryMode: “template-only” vs “validated-SQL” vs “disabled”

Then each endpoint checks the relevant policy fields.

Connectors: abstract the data targets

I implemented connectors behind a narrow interface. In pseudocode, it looks like:

WarehouseConnector.previewQuery(...)
LakehouseConnector.writeArtifact(...)
JobRunnerConnector.scheduleJob(...)

Each connector is responsible for:

Translating requests to provider APIs/clients
Using the correct credentials for the tenant
Returning a normalized result

The tenant credential story is worth calling out. I avoided embedding per-tenant secrets directly in Vercel environment variables beyond small pilots. Instead, I used one of two patterns:

A secure secrets store keyed by tenantId
Short-lived credentials minted by an internal broker

In early stage, you can do per-tenant secrets in a DB encrypted column or a managed secrets service, but the end goal is to avoid “copy-paste env vars per tenant.”

Multi-tenant routing for many GPTs

There are two dimensions here: tenant identity and GPT identity.

Tenant identity answers: “which customer’s data is allowed?”
GPT identity answers: “which assistant is calling and what is it allowed to do?”

I modeled GPT identity as agentId embedded in the token. That matters because I might have one GPT that can only do read-only analytics and another GPT that can also schedule jobs. The tenant is the same, but the capability differs.

Concretely, I use a token payload like:

{
  "tenantId": "ten_acme",
  "agentId": "gpt_finance_ops",
  "scopes": ["read:datasets", "read:query", "write:jobs"],
  "exp": 1730000000
}

I verify this signature on every request and then intersect:

token scopes
tenant policy allowlist
endpoint action requirements

If any check fails, I return a clear error with a minimal message. I avoid leaking policy details like full dataset lists in error messages, because those can be useful to an attacker.

Why I treat “warehouse write” differently than “lake write”

A surprising design choice I made was to allow more “write-like” operations into a lakehouse than into a warehouse. In my experience, warehouses tend to become the canonical reporting layer, and mistakes there hurt. For lake writes, I often write into a tenant-scoped “staging” or “artifacts” area, and a downstream job promotes those artifacts into curated tables.

So I exposed write operations as “create artifact” or “enqueue job,” not “insert rows into production tables.” That gave me a safer runway and a more auditable pipeline.

3) Validation

I validate at three levels: spec, runtime behavior, and security posture.

Spec validation

I run an OpenAPI validator in CI to ensure the spec is parsable and consistent. Even if I’m not using a generator, I want the contract to stay stable as I add endpoints.

Expected output is essentially “spec valid.” When it fails, it’s usually broken $ref pointers or mismatched response schemas.

Runtime validation

I test endpoints with curl. It’s blunt, but it reveals incorrect auth and policy behavior immediately.

Get tenant context:

curl -sS https://gpt-gateway.example.com/v1/tenants/me \
  -H "Authorization: Bearer $TENANT_TOKEN" | jq

Expected output:

{
  "tenantId": "ten_acme",
  "agentId": "gpt_finance_ops",
  "capabilities": ["read:datasets", "read:query"]
}

List datasets (policy-scoped):

curl -sS "https://gpt-gateway.example.com/v1/catalog/datasets?system=warehouse" \
  -H "Authorization: Bearer $TENANT_TOKEN" | jq

Preview query (bounded rows):

curl -sS https://gpt-gateway.example.com/v1/query/preview \
  -H "Authorization: Bearer $TENANT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "system": "warehouse",
    "dataset": "analytics",
    "query": "select date, revenue from daily_kpis order by date desc",
    "maxRows": 20
  }' | jq

The behaviors I require:

The gateway truncates output to maxRows regardless of what the model asks for
The gateway rejects queries with forbidden tokens (e.g., insert, update, drop)
The gateway rejects datasets not in the tenant allowlist

Security validation

I keep a list of negative tests that I treat as release-blockers:

Wrong tenant token → 401
Expired token → 401
Valid token but insufficient scope → 403
Valid token and allowed scope, but disallowed dataset → 403
Query attempting write or DDL → 400/403
Attempt to override tenant in body/header → ignored and rejected if mismatch

I also test prompt injection resilience indirectly. I simulate the worst-case: a request tries to exfiltrate another tenant’s data. If my policy model is correct, the request should fail before any connector call happens.

Results

What worked
- One OpenAPI contract served multiple GPTs cleanly
- Tenancy became a predictable layer I could reason about
- Bounded outputs and strict validation reduced surprising behavior from the model
- Adding a new GPT “persona” became mostly token + scope configuration
What didn’t
- It’s easy to overexpose “convenience endpoints” early
- Query validation is always trickier than it looks
- My first pass at logging was not good enough to debug model calls quickly
Metrics / screenshots (optional)
- The best metric was onboarding speed: after the gateway existed, spinning up a new tenant became configuration work instead of backend work

Gotchas / Notes

There are a few gotchas I now treat as architecture rules.

First, I do not allow the model to choose the tenant. Tenant identity comes from the token. Anything else invites cross-tenant leakage.

Second, I avoid any endpoint that returns large result sets. Even if the tenant is trusted, it’s too easy for the model to accidentally request something expensive or sensitive. I always enforce row limits and often enforce column allowlists or “approved views only” for higher-sensitivity tenants.

Third, error messages must be model-friendly. A generic “400 Bad Request” makes the model thrash. I return a short error code plus a short explanation, but I avoid leaking policy internals. Examples: QUERY_FORBIDDEN_TOKEN, DATASET_NOT_ALLOWED, SCOPE_REQUIRED.

Fourth, idempotency matters more than I expected. If the GPT retries (or the user triggers multiple runs), you can accidentally schedule duplicate jobs or create duplicate artifacts. I added Idempotency-Key support on write endpoints and keyed it by (tenantId, agentId, key).

Finally, secrets management becomes the real scaling bottleneck. If you plan to support dozens of tenants across different warehouses, do not store secrets as ad hoc env vars. Invest early in a secrets store or a broker that mints short-lived credentials.

The next improvements I want are about operational maturity.

I want a tenant onboarding CLI that:

creates tenant config
issues an agent token with the right scopes
validates connectivity to the tenant’s warehouse/lakehouse
prints “ready” steps for configuring the Custom GPT action

I also want a safer query model. For many tenants, “validated SQL” is still too flexible. I’m leaning toward “query templates” where the model selects from pre-approved templates and fills in parameters, which turns the action into something closer to RPC than free-form execution.

Finally, I want policy and audit logs to be first-class. The gateway should provide a “what happened?” timeline: which actions were called, with what scopes, what was executed, and what data was touched.

Multi-Tenant OpenAPI Action Gateways for Custom GPTs on Vercel

In this article

Actions

Multi-Tenant OpenAPI Action Gateways for Custom GPTs on Vercel

Multi-Tenant OpenAPI Action Gateways for Custom GPTs on Vercel

TL;DR

Goal

Context

Approach

Steps

1) Setup

My baseline repo layout

2) Implementation

OpenAPI: keep it small and explicit

The request pipeline: tenant resolution, policy, validation, execution

Connectors: abstract the data targets

Multi-tenant routing for many GPTs

Why I treat “warehouse write” differently than “lake write”

3) Validation

Spec validation

Runtime validation

Security validation

Results

Gotchas / Notes

Next