Security

Summary

The Telemy project has undergone three major security review cycles. The first was a code review of telemy-v0.0.3 on 2026-02-28 that identified 15 findings (3 critical, 7 important, 5 minor) focused on the Rust obs-telemetry-bridge and Go control plane. All 15 findings were resolved by 2026-03-02 through a concentrated backend hardening session, including fixing a NULL DACL on the IPC named pipe, eliminating plaintext token logging, and switching idempotency keys to UUID v4. A separate code review of commit 9f8fa7f on 2026-03-01 approved the aegis_client module extraction with one required action (idempotency key format) and noted systematic Mutex poisoning recovery as a best-practice improvement.

The second major audit was the Codex 5.3 refactor audit of telemy-v0.0.4 on 2026-03-10, which scanned the C++ OBS plugin, Go control plane, and srtla-fork/srtla-receiver-fork repos. It produced 24 findings across 6 categories (security, architecture, code quality, modernization, technical debt, cross-repo consistency). An Opus 4.6 peer review verified each finding against source code, adjusted severities (mostly downward due to deployment context like Cloudflare TLS and EC2 security groups), and added 9 new findings the Codex scan missed — including the relayCache secret re-introduction after hardening, SRT port default mismatch (9000 vs 5000), and zero C++ test coverage.

The combined final plan consolidated all findings into 35 prioritized items (1 critical, 8 high, 19 medium, 7 low) with an estimated total effort of 112.5 hours. The sole CRITICAL finding is unauthenticated SRTLA registration (SEC-01/RF-001), currently mitigated by EC2 security group IP restrictions but lacking application-layer authentication. Six quick wins (effort S, MEDIUM+ severity) were identified for immediate action, including filtering secrets from relayCache, releasing leaked EIPs on persistence failure, and fixing the SRT port default mismatch.

Timeline

  • 2026-02-28: Initial code review of telemy-v0.0.3 produced 15 findings. Triage document created with fix ordering across 4 phases (security/functional, stability, correctness, maintenance).
  • 2026-03-01: Code review of commit 9f8fa7f (aegis_client module extraction). Approved with 1 required action (idempotency key format). 31 tests passing, 0 new warnings.
  • 2026-03-02: All 15 v0.0.3 findings resolved. Critical fixes: IPC pipe DACL restricted, server token logging redacted, non-Windows vault marked unsafe. Key commits: 4d9a9f5 (UUID v4 idempotency), a35e5e3 (MutexExt trait, OTel error handler), 3198766 (delta bitrate, module split).
  • 2026-03-10: Codex 5.3 audit of telemy-v0.0.4 produced 24 findings across telemy-v0.0.4, srtla-fork, and srtla-receiver-fork.
  • 2026-03-10: Opus 4.6 peer review verified all 24 Codex findings against source code, adjusted severities, added 9 new findings. Total: 35 findings.
  • 2026-03-10: Final refactoring plan created with 4-phase execution order and effort estimates.

Current State

35 findings from the v0.0.4 audit remain open (the final plan was created 2026-03-10 but no resolution tracking document exists yet for v0.0.4). The v0.0.3 findings are all resolved.

CRITICAL (1 open):

  • RF-001 (SEC-01): Unauthenticated SRTLA registration — UDP 5000 accepts REG1/REG2 packets from any source. Mitigated by EC2 security group IP restrictions only. Effort XL (~10h). Requires HMAC-based registration auth, rate limiting, and CIDR allowlist.

HIGH priority open items (8):

  • RF-002: Stats thread data race (undefined behavior, crash risk) in srtla-fork
  • RF-003: Saved relay config not applied to live RelayClient without OBS restart
  • RF-004: Heartbeat loop permanently stops on transient HTTP 5xx errors
  • RF-005: API key file written without chmod 0600 in relay user-data
  • RF-006: Dead wss://:7443 endpoint pollutes API contract
  • RF-007: Stats HTTP server open to internet with wildcard CORS, exposes client IPs
  • RF-008: Provisioning goroutines detached from shutdown lifecycle (orphan EC2 risk)
  • RF-009: Zero automated test coverage for C++ plugin code

Quick wins ready for immediate execution (~3 hours total):

  1. RF-010: Filter secrets from relayCache in aegis-dock-bridge.js (3-line change)
  2. RF-011: Release EIP on SetUserEIP failure in handlers.go (5-line fix)
  3. RF-012: Fix SRT port defaults from 9000 to 5000 (relay_client.cpp + store.go)
  4. RF-013: Inject relay domain from config into store (remove hardcoded strings)
  5. RF-014: Join stats_server thread instead of detaching
  6. RF-015: Add aegis-dock-app.js to CMake post-build copy

Key Decisions

  • 2026-02-28: Idempotency key format standardized on UUID v4 — Rust and Go sides must align; prefix-based keys (telemy-{ts}-{random}, dash-{ts}-{random}) dropped in favor of UUID for stronger uniqueness guarantees.
  • 2026-03-02: std::sync::Mutex usage in async context intentionally kept — verified no locks held across .await points; tokio::sync::Mutex overhead avoided for non-blocking locks. Documented as safe.
  • 2026-03-02: Non-Windows vault storage explicitly marked as unsafe/limited rather than implementing cross-platform keychain — future expansion will require platform-native secure storage.
  • 2026-03-10: SEC-02 (WSS :7443) downgraded from CRITICAL to HIGH — the endpoint is dead but not actively exploitable since no component listens on it and the OBS plugin does not use it.
  • 2026-03-10: SEC-04 (API plain HTTP) downgraded to MEDIUM — intentional architecture decision (Cloudflare flexible SSL), not a misconfiguration. Recommendation changed from “fail startup” to “add startup warning.”
  • 2026-03-10: SEC-05 (title/hash transport) classified as designed CEF IPC mechanism, not a leak. Severity reduced to MEDIUM. The relay_shared_key in save_config payloads is the residual concern.
  • 2026-03-10: SEC-06 (CustomEvent broadcast) downgraded to MEDIUM — broadcast is within a single CEF panel origin, not cross-window. Real concern is pair_token in relayCache (covered by RF-010).
  • 2026-03-10: DEBT-01 (idempotency stale expiry) downgraded to LOW — expired records are filtered by expires_at > now() and periodic cleanup makes the collision window extremely narrow.
  • 2026-03-10: Four-phase execution order established: Quick Wins (S effort) Critical+High Safety Security Hardening+API Quality Architecture+Test Infrastructure.

Experiments & Results

ExperimentStatusFindingSource
Codex 5.3 automated audit (v0.0.4, 3 repos)Complete24 findings across 6 categories; 2 CRITICAL, 13 HIGH, 6 MEDIUM, 3 LOW (pre-review severity)codex-report.md
Opus 4.6 peer review of Codex findingsCompleteAll 24 verified; 8 severity adjustments (mostly downward); 9 new findings added; 0 disagreementsopus-review.md
MutexExt trait for poison recoveryDeployedEliminated 36+ .lock().unwrap() panic risks; lock_or_recover() clears poison and logs warningCODE_REVIEW_RESOLUTION_2026_03_02.md
Delta-based bitrate calculationDeployedReplaced session-average with instantaneous measurement; metrics now reflect real-time network stateCODE_REVIEW_RESOLUTION_2026_03_02.md
const_time_cmp in srtla-forkEvaluated`diff= *ca - *cb` pattern does not prevent timing side-channels on comparison result; LOW risk since it only compares connection IDs, not secrets
Hand-rolled MsgPack C++ parser expansionDeployed (interim)Extended to support signed ints, floats, binary, extended types; full msgpack-c migration plannedCODE_REVIEW_RESOLUTION_2026_03_02.md

Gotchas & Known Issues

  • EC2 security group is the only SRTLA auth: The sg-0da8cf50c2fd72518 security group restricts inbound UDP 5000 to encoder source IPs. If the security group is misconfigured or the relay is deployed outside managed infrastructure, any host can register groups and consume resources (SEC-01).
  • Stats server exposes client IPs to the internet: TCP 5080 is open in the relay security group (sgr-01ac088e296ea11c9, 0.0.0.0/0). The unauthenticated stats endpoint with wildcard CORS returns client IP addresses, byte counts, and ASN organization names (SEC-07).
  • Orphaned EIPs cost $3.60/month each: If AllocateElasticIP succeeds but SetUserEIP fails, the EIP is never persisted and a new one is allocated next provision. Requires manual AWS console cleanup (DEBT-02/RF-011).
  • SRT port defaults disagree: C++ client and SQL default to 9000, but the actual relay ingest port is 5000. Only matters if the session response is missing srt_port, but could cause a silent connectivity failure (NEW-09/RF-012).
  • relayCache re-introduces secrets: After security hardening stripped secrets from dock JS surfaces, the notifyDockActionResult path re-introduces pair_token and ws_url into the bridge state (NEW-05/RF-010).
  • Stats thread data race is undefined behavior: build_stats_json() iterates conn_groups while the UDP loop mutates them without locking. This is a potential crash, not just a correctness issue (ARC-02/RF-002).
  • QFile truncate-then-write is not crash-safe: Vault and config writes truncate the file before writing. A crash in between loses the file. QSaveFile (atomic write) is available in the Qt6 dependency but not used (QLT-02/RF-021).
  • receiver.sh is upstream code: Many findings in the srtla-receiver-fork (SEC-08, SEC-09, SEC-10) affect the upstream installer script, not the Aegis production deployment path (which uses aws.go user-data). Fixes may need upstream coordination.

Open Questions

  • Should RF-001 (SRTLA authentication) be implemented as HMAC on REG1/REG2 packets, or would a simpler pre-shared-key/CIDR allowlist approach be sufficient given the managed deployment model?
  • Should the dead wss://:7443 endpoint (RF-006) be removed entirely, or should a WSS listener be implemented for future real-time telemetry features?
  • What is the CI pipeline status? RF-009 (C++ test harness) and RF-015 (packaging smoke test) both reference CI integration, but no CI pipeline appears to exist yet.
  • For RF-018 (container least-privilege), should the Dockerfile changes be upstreamed to the srtla-receiver fork, or maintained as Telemy-specific patches?
  • Is the generatePairToken modulo bias (RF-029) worth fixing given the 8-character token length and the ~0.4% per-character skew, or should it be deferred indefinitely?

Sources

  • codex-report.md — Codex 5.3 automated audit of telemy-v0.0.4, srtla-fork, srtla-receiver-fork (24 findings, 2026-03-10)
  • opus-review.md — Opus 4.6 peer review with severity adjustments and 9 new findings (2026-03-10)
  • final-plan.md — Consolidated refactoring plan, 35 items prioritized P0-P3 with effort estimates (2026-03-10)
  • CODE_REVIEW_9f8fa7f.md — Code review of aegis_client module extraction, commit 9f8fa7f (2026-03-01)
  • TRIAGE_CODE_REVIEW_2026_02_28.md — Triage of 15 v0.0.3 code review findings (2026-02-28)
  • CODE_REVIEW_RESOLUTION_2026_03_02.md — Resolution summary for all 15 v0.0.3 findings (2026-03-02)