ADR-029: Internal (service-to-service) auth, distinct from public
Status
Accepted -- 2026-06-21 (design ratified; implementation is a follow-up)
Tracks #171. Extends ADR-024 (auth/authz strategy).
Context
ADR-024 defines two authorization states for an RPC, expressed as proto options consumed by createProtoAuthzInterceptor / the JWT auth interceptor's skipMethods:
- gated -- a
default_policy: "allow"/"deny"service plus per-method policies; the JWT auth interceptor must authenticate the caller. public--service_auth { public: true }ormethod_auth { public: true }: authentication is skipped entirely; anyone reachable on the wire may call it.
A third, real situation has no first-class expression: an internal, service-to-service call that carries no end-user JWT but is not meant to be world-open. It surfaced while building the car-sharing Temporal example (Phase 2):
- The trip saga runs in a Temporal worker (a separate process, not a Connectum
Server). Its activities call internal RPCs (TripService.RecordTrip,EndTrip) over the network with noAuthorizationheader -- Connectum does not auto-propagate inbound headers across a worker's client. - To let those calls through, the methods are annotated
method_auth { public: true }. But the real trust boundary is the mesh (Istio mTLSSTRICT+ anAuthorizationPolicythat admits only the trips ServiceAccount). The proto now says "public" -- over-stating exposure and reading wrong to anyone auditing the contract.
So today public is overloaded: it means both "intentionally world-open" (a health probe) and "internal, trusted by the network boundary" (a worker-only RPC). They have different security postures and should be auditable as different things.
The same gap exists for any out-of-process internal caller: schedulers, batch jobs, the new createCatalogClient (#170).
Existing building block: createGatewayAuthInterceptor (@connectum/auth) already authorizes a request from identity injected by a trusted upstream (a gateway/mesh) via a trustSource predicate + header stripping. That trust-source machinery is most of what an internal interceptor needs -- it just is not wired to a proto-level "this method is internal" marker.
Decision (proposed)
Introduce a first-class internal marker, distinct from public, plus an interceptor that authorizes internal calls by a configurable trust source rather than by an end-user token.
1. Proto annotation
Add an internal boolean to the existing auth options (alongside public):
// connectum/auth/v1/options.proto
message ServiceAuth { optional bool public = ...; optional bool internal = ...; /* ... */ }
message MethodAuth { optional bool public = ...; optional bool internal = ...; }internal: true means: skip end-user (JWT) authentication, but require an internal trust marker (see §2). public keeps its meaning (no auth at all). A method is at most one of public / internal / gated; resolveMethodAuth (the existing service+method merge in @connectum/auth) is extended to surface internal.
2. createInternalAuthInterceptor — a pluggable per-service trust source
A new interceptor in @connectum/auth that, for internal methods, authorizes the call from a configurable trust source (a predicate returning AuthContext | null, reusing the createGatewayAuthInterceptor pattern) and rejects anything lacking it as Unauthenticated. The credential must be per-service, so compromising one microservice cannot forge another's identity — a single static shared secret is explicitly not the default. Three factories:
- (a)
meshIdentityTrust(production default — inherently per-service). Verify the mesh-forwarded peer identity (the sidecar terminates mTLS and forwards a header — an Istio short-form ServiceAccountcluster.local/ns/<ns>/sa/<sa>or a SPIFFE id) against an allow-list; allow-list entries carry the caller's roles/scopes. The mesh issues each workload its own mTLS identity, so this is per-service by construction. - (b)
signedTokenTrust(non-mesh containment path — per-service, NOT a shared secret). Each caller signs a short-lived JWT with its own private key; the interceptor verifies it via that service's public key (JWKS), reusing the existingcreateJwtAuthInterceptorJWKS machinery. Compromising service A's key forges only A.- Hard security requirement (verified empirically with
jose): the JWKS lookup MUST be issuer-bound — select the key fromjwksByIssuer[iss](or run N verifiers, each pinned to onejwksUri+ a fixedissuer). A single shared JWKS holding multiple services' keys does not contain compromise:joseresolves the signing key bykidindependently of theissclaim, so a token claimingiss: "B"signed with A's key (headerkid: kid_A) is accepted against a shared keyset. Without issuer-binding, (b) is weaker than an honest shared secret because it advertises containment it does not deliver. The framework ships only the verification primitive; key issuance/rotation/JWKS publication belong to the deployment (SPIRE / the IdP / the mesh) — Connectum adds no key-management subsystem.
- Hard security requirement (verified empirically with
- (c)
sharedSecretTrust(documented dev-only fallback). A single loaded secret, constant-time compared. Simplest, but not per-service — one compromise forges all — so it is for local/dev only and labeled as such.
For non-internal, non-public methods the interceptor is a no-op. Chain ordering is load-bearing: the internal interceptor (and the JWT interceptor) run before createProtoAuthzInterceptor — they populate the AuthContext that proto-authz then consumes — i.e. errorHandler → (jwtAuth | internalAuth) → protoAuthz → …, not "alongside".
3. Inclusive composition with the existing authz model
internal is a boolean sibling of public in service_auth/method_auth; roles compose through the existing requires { roles, scopes } option — there is no parallel requires_identity mechanism. The internal interceptor sets a normal AuthContext (subject = the service identity; roles/scopes from the trust source). createProtoAuthzInterceptor gains one rule so internal composes inclusively within its current flow:
internal+ identity present + norequires→ allow (an internal method with no role gate is reachable by any trusted internal caller). (Without this, aninternal-only method falls through to the existingdefault_policy: "deny"and is wrongly rejected.)internal+requires {roles/scopes}→ fall through to the existing roles/scopes check against theAuthContext(one model, inclusive — the internal identity's roles gate the call exactly like a JWT caller's).internal+ no identity →Unauthenticated.
resolveMethodAuth is extended to surface internal; the JWT auth interceptor skips internal methods via a new getInternalMethods (mirroring getPublicMethods). Promoting a method public → internal thus removes world-open exposure (it now requires the internal trust marker, not merely any authenticated JWT).
Options considered
- New
internalannotation + interceptor (recommended). Auditable in the contract, distinct posture frompublic, reuses the proven trust-source machinery. Cost: a proto option + a new interceptor + docs. - Document
createGatewayAuthInterceptorfor service-to-service; no new annotation. Cheapest, but leaves the proto sayingpublic-- the audit/over-exposure problem (the actual motivation) remains. - mTLS-only (no annotation), authorize every call by peer identity. Strong, but couples the framework to mTLS termination and does not express intent in the contract; many deployments terminate mTLS at the mesh sidecar, not the app.
Consequences
- Positive: internal calls stop masquerading as
public; the contract is auditable; out-of-process callers (worker, scheduler,createCatalogClient) have a sanctioned, non-world-open path; the mesh stays the enforcement layer while the app expresses intent. - Negative / cost: a new proto option (additive; BSR contract update per the repo rule), a new
@connectum/authinterceptor + tests + docs, and a migration note for examples currently usingpublicfor worker-internal RPCs (e.g.car-sharingRecordTrip/EndTrip). - Compatibility: purely additive (new option defaulting false, new interceptor opt-in). Existing
publicand gated behavior unchanged.
Ratified decisions (2026-06-21)
- Trust source — both (a) and (b), per-service. Ship
meshIdentityTrust(a, production default) +signedTokenTrust(b, non-mesh, per-service JWT/JWKS with mandatory issuer-bound key selection) +sharedSecretTrust(c, dev-only fallback). No single static shared secret as the recommended mode. (b)'s issuer-binding is a hard security requirement (§2), not optional. - Annotation surface — stay in the model.
internalis a boolean sibling ofpublic; roles compose inclusively through the existingrequires {roles,scopes}option (§3). The richerrequires_identityis rejected. - mTLS SAN reading (option c — app reads the peer cert) — DEFER. ConnectRPC interceptors have no peer-cert access and
@connectum/coreexposes no peer-cert surface, so app-level SAN reading needs new core plumbing. In mesh deployments the sidecar terminates mTLS and forwards identity as a header (that is option (a)), so (a)+(b) cover the motivating worker case and the absence of (c) blocks nothing there. App-terminated mTLS without a mesh is the only case (c) would serve; deferred to a follow-up that designs the core peer-cert surface. - Example migration (
car-sharingRecordTrip/EndTrippublic→internal) — separate follow-up, not part of this ADR. The ADR stays purely additive.
Implementation notes
connectum/auth/v1/options.proto: addinternaltoServiceAuth/MethodAuth— additive; the auth proto path is already inrelease.yml's "Check proto changes" list (confirm at implementation).- New
@connectum/authexports:createInternalAuthInterceptor, the three trust-source factories,getInternalMethods; extendresolveMethodAuth+createProtoAuthzInterceptor(the one inclusive rule of §3). - Tests: per-service containment (the issuer-bound (b) MUST reject A-signed-as-B — the empirical case from §2), the inclusive role composition, and the
internal-no-identity →Unauthenticatedpath.
