Database
Summary
Telemy v0.0.5 uses PostgreSQL 15+ as its sole persistence layer for the cloud control plane. The schema (DB Schema v1) spans 12 tables covering identity and API access (users, auth_sessions, api_keys, plugin_login_attempts), relay infrastructure lifecycle (relay_instances, sessions, relay_pool, relay_assignments), multi-stream support (user_stream_slots), billing and metering (billing_events, idempotency_records), and Twitch EventSub conduit persistence (eventsub_conduit_state). Primary keys use prefixed text IDs (usr_, ses_, rly_, key_, use_, evt_), timestamps are timestamptz in UTC, monetary amounts are integer cents, and durations are integer seconds.
The schema has evolved through at least 26 migrations. Migrations 0016-0018 added always-ready relay features (lease management, DNS slugs, session migration). Migration 0019 added the billing subsystem with LemonSqueezy webhook integration, grace period enforcement, and subscription reconciliation. Migration 0026 added eventsub_conduit_state to persist Twitch EventSub conduit IDs across server restarts, eliminating a manual chatbot toggle requirement.
The persistence layer is implemented in Go using pgx for database access. Store methods follow a straightforward pattern: query functions return typed results or ErrNotFound, and upserts use INSERT ... ON CONFLICT DO UPDATE for idempotency. Five operational jobs run on the control plane: idempotency TTL cleanup, session usage rollup, outage reconciliation, grace period enforcement (every 5 minutes), and subscription reconciliation (every 1 hour, stub pending production products).
Timeline
- Pre-2026-03: Schema v1 established with core tables:
users,auth_sessions,api_keys,plugin_login_attempts,relay_instances,sessions,idempotency_records. - Migration 0016 (AR-0): Added
lease_expires_atandlast_heartbeat_attosessionsfor relay lease management. - Migration 0017 (AR-2): Added
dns_slugtouser_stream_slotsfor DNS-based routing for always-ready relays. - Migration 0018 (AR-3): Added
supersedes_session_idandmigration_statustosessionsfor seamless session migration. - Migration 0019: Added
billing_eventstable and billing columns tousers(lemon_customer_id,lemon_subscription_id,lemon_addon_subscription_id,plan_grace_deadline). - 2026-03-27: Conduit persistence design approved. Migration 0026 added
eventsub_conduit_statetable to persist Twitch EventSub conduit IDs across API restarts.
Current State
The schema has 12 tables across 26+ migrations. The store layer exposes typed Go methods via a Store interface used by the API router, with a mockStore for testing.
Key operational constraints:
- One active session per user, enforced by a partial unique index on
sessions. - Relay pool assignment uses
FOR UPDATE SKIP LOCKEDonrelay_poolto pick the least-loaded healthy server without contention. - Idempotent relay start:
idempotency_recordsstores request hashes and canonical responses; replayed requests while a session isactivereconstruct the response from current state. - Billing idempotency:
billing_events.lemon_event_idusesON CONFLICT DO NOTHINGto deduplicate LemonSqueezy webhooks. - Grace period enforcement runs every 5 minutes, downgrading users with expired
plan_grace_deadlineto free tier and force-disconnecting active relays. - Subscription reconciliation runs hourly but is currently a stub pending production LemonSqueezy products.
- Conduit persistence is implemented and deployed: on startup the control plane loads the persisted conduit ID, seeds it into
EventSubManager, verifies/updates the Twitch shard, and persists back. Failures are non-fatal.
Plan tiers are free, standard, and internal. Plan statuses are active, past_due, canceled, and trial. Session statuses follow the state machine: provisioning -> active -> grace -> stopped (with grace -> active recovery allowed).
Key Decisions
- Schema v1: Prefixed text primary keys (
usr_,ses_, etc.) over auto-increment integers — provides human-readable IDs and cross-system traceability. - Schema v1: Integer cents for money, integer seconds for durations — avoids floating-point precision issues in billing.
- Migration 0019: LemonSqueezy webhook events stored with full
jsonbpayload for audit trail, even unmatched events (user_id=NULL) retained for manual review. - Migration 0019: Grace period set to 48 hours on payment failure (
plan_grace_deadline = NOW() + 48 hours) before automatic downgrade. - 2026-03-27: Conduit persistence via single-row-per-provider table rather than a generic key-value store — keeps schema explicit and queryable. Only
twitchprovider for now. - 2026-03-27: Conduit restoration failures are non-fatal at startup — server starts regardless, chatbot becomes functional on next manual toggle or retry.
- Schema v1: Deprecated
eip_allocation_idandeip_public_ipcolumns retained inusersfor archive reference after migration to PoolProvisioner.
Experiments & Results
| Experiment | Status | Finding | Source |
|---|---|---|---|
| AWS EIP per-user allocation | Deprecated | Replaced by relay_pool + relay_assignments (PoolProvisioner). Legacy columns retained in users table. | DB_SCHEMA_v1.md |
| In-memory conduit ID storage | Replaced | Conduit ID lost on API restart, requiring manual chatbot toggle off/on. Replaced by DB persistence in migration 0026. | conduit-persistence-design.md |
| Subscription reconciliation job | Stub | Hourly job to poll LemonSqueezy API implemented as stub, blocked on production products being set up in LemonSqueezy. | DB_SCHEMA_v1.md |
Gotchas & Known Issues
eip_allocation_id/eip_public_iponusersare deprecated and unused by PoolProvisioner. Do not populate these columns in new code.addon_relay_countis capped at 3 for non-internal tiers. Theinternaltier is exempt from this cap.reconcile_subscriptionsrequiresTELEMY_LEMON_API_KEYenvironment variable and is currently a stub.- Conduit persistence only covers the
twitchprovider. If additional EventSub providers are added, each gets its own row keyed byprovider. plugin_login_attemptscompleted attempts are not reusable — the plugin must stop polling after the first200 completedresponse. A claimed attempt cannot be re-claimed.- Session state transitions are enforced in the service layer, not via database triggers. Invalid transitions (e.g.,
stopped -> active) must be caught in application code. - Outage reconciliation compares backend-calculated duration against relay health event
session_uptime_seconds. A positive gap results in billing adjustments — this relies on relay health events being accurately reported.
Open Questions
- When will LemonSqueezy production products be set up, unblocking the
reconcile_subscriptionsjob? - Should deprecated EIP columns (
eip_allocation_id,eip_public_ip) be dropped in a future migration or kept indefinitely for archive reference? - Will additional EventSub providers beyond
twitchneed conduit persistence, and if so, does the single-row-per-provider model scale appropriately? - Are there plans for a migration to clean up or archive old
billing_eventsandidempotency_recordsrows beyond TTL cleanup?
Sources
- DB_SCHEMA_v1.md
- 2026-03-27-conduit-persistence-design.md
- 2026-03-27-conduit-persistence.md