Architecture
Summary
Telemy is an IRL streaming platform built around a single-DLL OBS plugin (telemy-obs-plugin.dll) that runs entirely inside the OBS process, paired with a Go control plane API and a shared relay pool for bonded SRT delivery. The system evolved from Project Aegis’s original Rust child-process + C++ shim + IPC pipe design (documented in the Aegis Architecture First Pass) through a v0.0.3 hybrid architecture, arriving at the current v0.0.5 all-native C++ plugin with no standalone binary, no IPC layer, and no Rust dependency. The control plane runs on Advin VPS with PostgreSQL, managing relay provisioning, authentication, billing (LemonSqueezy), and DNS (Cloudflare).
The relay infrastructure shifted from per-user ephemeral EC2 instances (AWS provisioner, deprecated 2026-03-20) to a shared always-on VPS pool model (PoolProvisioner, deployed 2026-03-23). Each relay node runs a dual-process Docker stack: a custom srtla_rec fork for bonded UDP proxying and SLS v1.5.0 for SRT session management. The v0.0.6 design (drafted 2026-03-21) extends this into a global relay mesh with smart region-based assignment, GeoIP integration (deferred), and permanent slug-based DNS records. A repo split (designed 2026-04-02) will separate the open-source MIT-licensed OBS plugin from the proprietary backend into two repositories.
The component boundaries are strict: the C++ plugin handles metrics collection (OBS/Win32/NVML at 500ms), config storage (DPAPI-encrypted vault), relay client lifecycle, and CEF dock hosting. The Go control plane handles auth (browser-handoff login, JWT sessions), entitlement enforcement, relay pool assignment, DNS record management, and billing webhooks. The dock UI (React, bundled as minified JS) communicates with the native plugin via a document.title transport encoded as percent-encoded JSON, with secrets never exposed to the JS layer.
Timeline
- Pre-v0.0.4: Project Aegis architecture designed with C++ OBS shim + Rust child process (
aegis-core.exe) + Named Pipe IPC + MessagePack protocol. Local-first Phase 1 (failover), cloud Phase 2+ (Go control plane + AWS relays). - v0.0.4: Transitioned to all-native C++ single-DLL architecture, eliminating Rust dependency and IPC pipe layer.
- v0.0.5 (current): Introduced browser-handoff plugin login, DPAPI ConfigVault, ConnectionManager with multi-connection stubs, BYOR (Bring Your Own Relay), stream slot system, per-link relay telemetry via custom
srtla_recfork, relay provision progress UI, and billing integration (LemonSqueezy). - 2026-03-20: AWS provisioner deprecated; PoolProvisioner becomes sole relay provider.
- 2026-03-21: v0.0.6 global relay mesh design drafted — smart region assignment, stable slug DNS, GeoIP (deferred), health scoring.
- 2026-03-23: Always-ready relay model deployed — sessions auto-provision on managed connection add, auto-deprovision on remove.
- 2026-04-02: Repo split design approved — public MIT plugin repo + private proprietary backend repo (
telemy-api).
Current State
The production system runs as a single-DLL OBS plugin on Windows 10/11, connecting to a Go control plane on Advin VPS with one relay node (kc1, Kansas City, us-central region). The relay pool infrastructure is fully operational: PoolProvisioner assigns sessions from relay_pool with health monitoring every 60 seconds, and Cloudflare DNS records are created/updated per session via internal/dns/cloudflare.go.
The plugin authenticates via browser-handoff flow (start/poll/complete pattern), stores JWT tokens in DPAPI-encrypted vault.json, and supports both managed relay connections (control-plane provisioned) and BYOR direct connections. Relay telemetry polls at 2-second intervals for both aggregate stats (SLS port 8090) and per-link stats (custom srtla_rec port 5080). The dock UI renders carrier-labeled bitrate bars, relay ingest cards, and provisioning progress (6-step pipeline).
The region selection dropdown exists in the dock UI (MANAGED_REGIONS in connection-components.jsx) and managed_region is stored per connection config, but RelayClient::Start() hardcodes region_preference: "" — the region is never transmitted to the API. This is the critical bug that v0.0.6 Phase 1 addresses.
The repo split has not yet been executed. The monorepo (telemy-v0.0.5/) still contains both the C++ plugin and Go control plane. The split plan calls for git-filter-repo to purge control-plane/ and dock JSX source from public history, with the private telemy-api repo receiving those contents.
Billing is integrated via LemonSqueezy webhooks with HMAC-SHA256 verification, supporting subscription lifecycle (free → standard at $9.99), grace periods (48h for payment failure), and force-disconnect on downgrade/expiry.
Key Decisions
- Pre-v0.0.4: Chose C++ shim + Rust core + Named Pipe IPC for Phase 1 local-first architecture. Local switching logic authoritative for immediate safety; cloud logic authoritative for relay lifecycle and billing.
- v0.0.4: Replaced Rust + IPC with all-native C++ single-DLL. Eliminated inter-process pipe deadlock risk identified in Aegis review. No standalone binary, no Rust dependency.
- v0.0.5: Adopted DPAPI for secret storage instead of plaintext. Vault uses flat JSON format (not nested), base64-encoded DPAPI blobs. Secrets never exposed to dock JS layer.
- v0.0.5: Chose browser-handoff pattern for plugin auth instead of in-plugin credential entry. Plugin opens browser, polls backend for completion. Backend is authoritative for entitlement enforcement.
- 2026-03-20: Deprecated AWS ephemeral EC2 provisioner in favor of shared VPS relay pool. Eliminates EC2 boot delay (~60s), reduces cost, sub-second session assignment.
- 2026-03-21: Deferred GeoIP auto-detection for v0.0.6. Reason: API caller IP is the OBS PC (home connection), not the mobile streaming device, so GeoIP would infer the wrong location. Manual region selection is primary mechanism.
- 2026-03-21: Chose slug-based permanent DNS (
{slug}.relay.telemyapp.com) over session-scoped DNS. Users configure IRL Pro and OBS once; DNS updates transparently between sessions with 60s TTL. - 2026-03-21: Sticky sessions — once assigned for a session, user stays on that server until disconnect. No mid-stream migration (too risky for live viewers).
- 2026-04-02: Approved hybrid open-source model. OBS plugin public under MIT license (builds trust for native DLL from unknown developer). Control plane stays proprietary (competitive moat: auth, billing, provisioning). Dock JSX source stays private; only minified bundle ships. Precedent: BELABOX’s AGPL srtla + closed cloud model.
- 2026-04-02: Chose MIT over AGPL for plugin license. AGPL’s network clause does not apply to a locally-running DLL.
Experiments & Results
| Experiment | Status | Finding | Source |
|---|---|---|---|
| Rust child process + Named Pipe IPC (Project Aegis) | Replaced in v0.0.4 | Pipe deadlock risk identified in independent review; overlapped I/O with bounded timeouts required. Eliminated entirely by moving to single-DLL architecture. | Aegis Architecture First Pass, sections 4.1-4.2, 8 |
AWS ephemeral EC2 relay instances (t4g.small ARM64) | Deprecated 2026-03-20 | Boot delay unacceptable for UX. ARM64 sizing required benchmark validation for sustained ingest. Replaced by always-on VPS pool with sub-second assignment. | Aegis Architecture First Pass section 11; ARCHITECTURE.md relay stack |
Per-link stats via custom srtla_rec fork | Deployed, working | Atomic per-connection byte/packet counters with relaxed ordering provide real-time per-link visibility. ASN-based carrier identification via libmaxminddb works for mobile carriers. | ARCHITECTURE.md per-link relay telemetry |
| Relay provision progress (6-step pipeline) | Deployed, working | 3-second minimum dwell per step with 2-second client polling provides smooth progress UX. Steps: launching_instance → waiting_for_instance → starting_docker → starting_containers → creating_stream → ready. | ARCHITECTURE.md relay provision progress |
document.title transport for CEF action dispatch | Deployed, working | Percent-encoded JSON prefixed with __TELEMY_DOCK_ACTION__: reliably transports actions from JS to C++ via CEF titleChanged callback. | ARCHITECTURE.md UI actions |
Gotchas & Known Issues
- Region preference never transmitted (critical bug):
RelayClient::Start()inrelay_client.cppline ~597 hardcodesregion_preference: "". The entire dock region dropdown is dead code until v0.0.6 Phase 1 fixes this. - Region naming mismatch: Dock uses short names (
us-east,us-west) while backend config uses AWS-style (us-east-1,eu-west-1). Even if region were transmitted,resolveRegion()would reject it and fall back to default. - OBS allocator mismatch: Must call
bfree()afterobs_module_config_path()— OBS uses its own allocator. Failure to do so causes memory leaks. - Bridge-data sync contamination: Any
useEffectsyncing bridge-derived data must guard onuseBridgeto prevent SIM fallback contamination in the dock UI. - Vault format must be flat JSON:
vault.jsonuses flat (not nested) JSON format with DPAPI-encrypted base64 blobs. Nested structures will break decryption. - Port 8090 accessibility: Relay stats polling requires port 8090 on the relay instance to be accessible from the OBS machine (Security Group rule).
- Port 5080 accessibility: Per-link stats require port 5080 on the relay instance to be accessible from the OBS machine.
- BYOR stats degradation: BYOR relays may not run SLS + srtla-receiver stack. Plugin handles missing stats endpoints gracefully (shows “Stats unavailable”), but core connection remains functional.
- Platform lock-in: Plugin is Windows-only due to DPAPI, WinHTTP, and Win32 system metrics APIs.
- CEF data URL inlining: All dock JS/HTML is inlined into a
data:text/htmlURL at plugin startup. Changes to dock assets require OBS restart. - Deferred show pattern: DockHost uses a 1.5s QTimer before deciding to float, respecting OBS DockState layout serialization. Removing this timer can break dock positioning.
Open Questions
- GeoIP accuracy for mobile streamers: The
/relay/startAPI call comes from the OBS PC, not the mobile device. GeoIP on the caller’s IP infers home location, not streaming location. Accurate auto-detection would require the mobile app calling the API directly or inferring location from the first SRTLA connection’s source IP. - Multi-region server selection scoring: The v0.0.6 design proposes moving from simple region-preference SQL to Go-side scoring (region match 50%, distance 30%, load 20%, health 10%) when lat/lon are available. The cutoff for when SQL-only scoring is insufficient is undefined.
- Latency-aware selection: Real network latency data (synthetic pings from cloud functions) vs. haversine distance approximation. Out of scope for v0.0.6 but flagged for future.
- Repo split execution timing: The repo split design is approved but not executed. History rewrite with
git-filter-repoand force push to both GitHub and GitLab remotes is pending. GitLab branch unprotection is a manual prerequisite. - Dock UI source in public repo after split: JSX source moves to private
telemy-api/dock-src/. Development workflow becomes: edit in private repo, esbuild, copy minified bundle to public repo. Cross-repo friction is a known tradeoff. - Final billing Time Bank numbers and overage rate constants: Listed as open in the Aegis Architecture First Pass and not yet resolved.
- Rate-limit values for chat command classes: Listed as open in the Aegis Architecture First Pass.
Sources
- ARCHITECTURE.md
- 2026-04-02-repo-split-design.md
- 2026-04-02-repo-split-plan.md
- 2026-03-21-v006-global-relay-mesh-design.md
- Project_Aegis_Architecture_First_Pass.md