API
Summary
The Telemy Control Plane API v1 is a REST API at base path /api/v1 serving the v0.0.5 native C++ OBS plugin. It handles authentication (browser-based plugin login with JWT + refresh token rotation), relay lifecycle management (start, stop, active polling), billing integration via LemonSqueezy webhooks, usage tracking under a Time Bank model, and capacity status for the public website. Transport is HTTPS-only (TLS 1.2+), with Cloudflare terminating TLS upstream and proxying to the Go API server on port 8080. The server itself runs plain HTTP behind Cloudflare.
The relay session state machine has four backend states — provisioning, active, grace, stopped — with five valid transitions enforced server-side (invalid transitions return 409). A separate client-side state machine governs scene switching (Phase 5b, not yet implemented) with six top-level modes (STUDIO, IRL_CONNECTING, IRL_ACTIVE, IRL_GRACE, DEGRADED, FATAL) and orthogonal scene intents (LIVE, BRB, OFFLINE, HOLD). As of v0.0.5, relay lifecycle uses the always-ready model (AR-0 through AR-3): relays provision on managed connection add, deprovision on remove, and auto-provision on OBS load.
On 2026-03-20 the control plane was migrated from AWS EC2 (52.13.2.122, ~8/mo) running PostgreSQL 16 and both Go binaries (telemy-api, telemy-jobs) in Docker Compose. The Cloudflare A record for api.telemyapp.com was updated to point at Advin, with UFW rules restricting port 8080 to Cloudflare IP ranges only. AWS EC2 relay provisioning remains on AWS; only the API and database moved.
Timeline
- 2026-02-21: API Spec v1 authored covering auth, relay lifecycle, idempotency, error contract, and rate limits.
- 2026-03-16: Plugin login flow spec finalized (browser-based attempt + poll model with
login_attempt_idandpoll_token). - 2026-03-20: Migration plan created to move control plane from EC2 to Advin VPS. Eight-phase plan: preparation, PostgreSQL Docker setup, DB migration, deploy, UFW, parallel verification, DNS cutover, EC2 teardown.
- 2026-03-22: Always-ready relay model (AR-0 through AR-3) designed, replacing manual start/stop with automatic provision-on-add.
- 2026-03-23: Always-ready relay model deployed. Scene-switching state machine (Phase 5b) deferred.
- Phase 5a (billing): LemonSqueezy billing endpoints (
/billing/checkout,/billing/webhook) deployed and operational. Subscription lifecycle events mapped to 7 action handlers (activate, update addon, payment recovery, payment failed, cancelled, downgrade, refund).
Current State
The API server runs on the Advin VPS (208.84.101.84) in Docker Compose alongside PostgreSQL 16. Cloudflare proxies api.telemyapp.com (orange cloud, SSL Flexible) with an Origin Rule rewriting to port 8080. UFW on Advin restricts port 8080 to Cloudflare IPv4 ranges.
Ten endpoints are live under /api/v1:
GET /auth/session— current user, entitlement, usage, active relayPOST /auth/plugin/login/start— initiate browser-based plugin loginPOST /auth/plugin/login/poll— poll login attempt statusPOST /auth/refresh— rotate refresh token and JWTPOST /auth/logout— revoke auth sessionPOST /relay/start— start or return active relay (idempotent, requiresIdempotency-Key)GET /relay/active— get active/provisioning sessionPOST /relay/stop— idempotently stop a relay sessionGET /relay/manifest— launchable regions and AMI metadataGET /capacity/status— public unauthenticated capacity check
Billing endpoints: POST /billing/checkout (generate LemonSqueezy checkout URL) and POST /billing/webhook (HMAC-SHA256 verified event ingestion).
Usage endpoint: GET /usage/current (Time Bank cycle data). Relay health: POST /relay/health (relay-to-backend liveness, authenticated via X-Relay-Auth shared secret with session_id+instance_id binding).
Per-link relay telemetry bypasses the control plane entirely — the C++ plugin polls the relay’s srtla stats server directly at relay_ip:5080/stats every ~2 seconds. Per-output multi-encode telemetry is collected via OBS C API, not the control plane.
The client-side scene-switching state machine (6 modes, 10 transitions, scene intent rules, reconnect-first startup) is specified but not yet implemented (Phase 5b).
Key Decisions
- 2026-02-21: JWT-only auth for control plane —
cp_access_jwtas short-lived bearer,refresh_tokenfor rotation, both stored in DPAPI-encrypted vault. Relay activation entitlement enforced server-side regardless of UI state. - 2026-02-21: Idempotency via
Idempotency-Key(UUIDv4) with 1-hour retention and async replay — replaying/relay/startreconstructs the current DB state rather than returning stale cached responses. - 2026-02-21: Per-link telemetry architecture decision: direct HTTP polling from plugin to relay’s srtla stats server (port 5080), bypassing the control plane entirely. Chosen for latency (2s polling) and simplicity.
- 2026-03-20: Migrate control plane from EC2 (8/mo) — net savings ~$15-20/mo. Go binaries and PostgreSQL 16 run in Docker Compose. AWS still used for relay EC2 provisioning only.
- 2026-03-20: UFW on Advin restricted to Cloudflare IP ranges for port 8080 — defense-in-depth since Cloudflare terminates TLS.
- 2026-03-23: Always-ready relay model adopted — relays provision on add, deprovision on remove.
/relay/startand/relay/stopnow called internally by the plugin, not via user-initiated buttons. - 2026-03-23: Scene-switching state machine deferred to Phase 5b. Relay lifecycle governed by always-ready model (AR-0 through AR-3), not the state machine.
Experiments & Results
| Experiment | Status | Finding | Source |
|---|---|---|---|
| Per-link relay telemetry via direct HTTP polling (port 5080) | Implemented (v0.0.4) | Bypassing control plane works well; 2s poll interval, ASN-based carrier labels via GeoLite2-ASN.mmdb | API_SPEC_v1.md sec 13 |
| Per-output multi-encode telemetry via OBS C API | Implemented (v0.0.4/v0.0.5) | obs_enum_outputs gives per-encoder stats (bitrate, drop%, FPS, lag), grouped by encoder in dock UI | API_SPEC_v1.md sec 14 |
| EC2-to-Advin migration parallel verification | Planned | Direct IP smoke test (curl -H "Host: api.telemyapp.com" http://208.84.101.84:8080/health) before DNS cutover to validate without risk | 2026-03-20-api-migration-advin.md Phase 5 |
| Docker Compose for control plane (postgres + api + jobs) | Deployed | Single docker-compose.yml with health checks, secrets via file mount, env via .env file | 2026-03-20-api-migration-advin.md Phase 1 |
Gotchas & Known Issues
- Error contract incomplete:
request_idand structureddetailsfields in error responses are not currently populated by the Go server — onlyerror.codeanderror.messageare returned. - Stats endpoint unauthenticated: TCP 5080 on the relay has no authentication. The relay security group must allow OBS machine access, but anyone who can reach the port can read per-link stats.
- Idempotency key window: Backend stores key mappings for only 1 hour. Clients replaying after that window get a fresh response, not idempotent replay.
- TLS assumption: The Go API server assumes TLS termination happens upstream. Clients must not infer TLS from port numbers (e.g., custom ports like 8443 still require explicit TLS).
- CGO static linking: The Docker Compose setup assumes
CGO_ENABLED=0for static binaries. If dynamically linked, the base image must change fromscratch/alpinetoubuntu:24.04. - Database name legacy: The database and user are both named
aegis(legacy naming from before the Telemy rebrand). Migration restores with--no-owner --role=aegis. - Plugin login flow is operator-assisted: The
authorize_urlcurrently lands on a temporary operator-assisted Cloudflare Pages flow attelemyapp.com/login/plugin?attempt=..., not a self-service user login page. - Rate limits are per-user:
/relay/startis 6/min,/relay/stopis 20/min,/relay/activeis 60/min,/usage/currentis 30/min. No global rate limits documented. - Scene-switching state machine not implemented: The full 6-mode state machine with scene intents, guard conditions, and hysteresis is specified but deferred to Phase 5b.
Open Questions
- When will the plugin login flow transition from operator-assisted to fully self-service?
- Should the relay stats endpoint (port 5080) get authentication, or is security-group restriction sufficient?
- What is the timeline for Phase 5b scene-switching state machine implementation?
- Should
request_idanddetailsbe added to the error contract, or is the current minimal contract acceptable for v1? - Is a database backup cron job running on Advin, or does it still need to be set up (mentioned in migration plan but not confirmed)?
- Will the EC2 instance (52.13.2.122) be terminated, or is it still kept as a warm standby? The plan specifies 24-hour wait before termination.
- Should rate limits be adjusted for the always-ready model where
/relay/startis called automatically (not user-initiated)?
Sources
- API_SPEC_v1.md
- STATE_MACHINE_v1.md
- 2026-03-20-api-migration-advin.md