Refactor Audit

Summary

On 2026-03-10, a comprehensive production-readiness audit of the Telemy codebase was designed and executed using a multi-agent pipeline: Codex GPT-5.3/5.4 produced a raw audit report across 6 categories, Claude Opus 4.6 reviewed and verified each finding against the actual source code, and the results were merged into a prioritized refactoring plan. The audit covered the Go control plane (80 Go files, 16,205 LOC, 29 migrations), the C++ OBS plugin (12 source files, 2,817-line god-file), the React dock UI (~10 JS/JSX files), the srtla-fork (6 C/C++ source files), and the srtla-receiver-fork Docker image.

The pipeline produced 35 total findings: 1 CRITICAL (unauthenticated SRTLA registration), 8 HIGH (data races, heartbeat reliability, API key exposure, dead WSS endpoint, unauthenticated stats, provisioning lifecycle), 19 MEDIUM (secrets in relay cache, EIP leaks, SRT port mismatch, domain source-of-truth, container hardening, race conditions), and 7 LOW (modulo bias, magic numbers, dead event listeners). Estimated total remediation effort is 112.5 hours. A follow-up migration audit on 2026-04-08 identified 2 additional HIGH-severity Go API auth issues (SEC-H1: OAuth token URL leak, SEC-H2: password reset session revocation gap) and mapped the full Telemyapp-to-GoLiveBro migration surface: 211 brand references across 44 Go files, 30+ TELEMY_ env vars, 6 telemy_ Prometheus metrics, and module path rename.

The Opus review adjusted several Codex severity ratings: SEC-02 (WSS endpoint) downgraded from CRITICAL to HIGH, SEC-04 (API TLS) downgraded to MEDIUM (intentional Cloudflare flexible SSL design), SEC-05 and SEC-06 downgraded to MEDIUM (CEF IPC is the designed transport, not a leak), DEBT-01 downgraded to LOW (cleanup job mitigates the stale expiry risk), MOD-02 downgraded to LOW (significant splitting already done). Opus also identified 9 new findings that Codex missed entirely, including C++ test coverage gap (HIGH), relay worker thread overlap (MEDIUM), SRT port default mismatch 9000 vs 5000 (MEDIUM), and relayCache re-introducing secrets after hardening (MEDIUM).

Timeline

  • 2026-03-10: Refactor audit pipeline designed and approved. Scope defined: obs-plugin, aegis-control-plane, srtla-fork, srtla-receiver-fork. Out of scope: archived versions, docs, scripts, build artifacts.
  • 2026-03-10: Codex 5.4 audit executed (Phase 1). Produced 24 findings across Security, Architecture, Code Quality, Modernization, Technical Debt, and Cross-Repo Consistency.
  • 2026-03-10: Claude Opus review completed (Phase 2). Verified each Codex finding against source code, adjusted 7 severity ratings, added 9 new findings. Total findings: 35.
  • 2026-03-10: Final refactoring plan merged (Phase 3). Findings prioritized into P0-P3 tiers with action items and effort estimates.
  • 2026-03-10: Results published to Confluence (MSS space), Jira (Epic with stories per refactor area), and Slack (#telemy-development).
  • 2026-04-08: GoLiveBro migration audit completed. 3 Claude Opus agents + Codex GPT-5.3 audited the Go API specifically. Found 2 new HIGH-severity auth issues and mapped full migration surface.

Current State

The refactoring plan (RF-001 through RF-035) is the active remediation backlog. The 6 quick wins (S effort, MEDIUM+ severity) were identified for immediate execution: filter secrets from relayCache (3-line change), release EIP on SetUserEIP failure (5-line fix), fix SRT port defaults to 5000, inject relay domain from config, join stats_server thread, add aegis-dock-app.js to CMake copy.

The April 2026 migration audit added a 5-phase migration order for the Telemyapp-to-GoLiveBro rebrand: Phase 1 pre-migration security fixes (SEC-H1, SEC-H2, TD-D1, TD-D2), Phase 2 module/import path rename, Phase 3 user-facing brand rename (CORS, emails, chat prefix, DNS), Phase 4 optional env var rename (TELEMY_ to GLB_/GOLIVEBRO_), Phase 5 optional metrics rename (telemy_ to glb_). Database schema is brand-neutral and requires no migration.

Several March audit findings are now obsolete due to the AWS provisioner removal: SEC-02 (WSS endpoint likely removed), SEC-08 (admin API key file permissions from aws.go user-data gone), DEBT-02 (EIP persistence failure gone with AWS). OBS plugin and srtla findings remain relevant in their separate repos.

Key Decisions

  • 2026-03-10: Three-phase pipeline design approved. Codex for raw audit, Opus for verification/augmentation, merged final plan. Rationale: Codex provides breadth, Opus provides depth and accuracy.
  • 2026-03-10: Scope included all active repos but excluded archived versions (v0.0.2, v0.0.3), docs, scripts, and build artifacts. Rationale: Focus engineering effort on production-facing code.
  • 2026-03-10: Six audit categories defined (code quality, architecture, modernization, security, tech debt, cross-repo). These categories provide clear grouping for remediation prioritization.
  • 2026-03-10: Quick wins (Phase 1) prioritized over critical items (Phase 2) in execution order. Rationale: Quick wins have highest value-to-effort ratio and build momentum.
  • 2026-04-08: Migration audit decided pre-migration security fixes should precede any rename work. Rationale: SEC-H1 (OAuth token URL leak) and SEC-H2 (password reset session gap) are functional security issues that should not carry into the rebranded codebase.
  • 2026-04-08: Database schema confirmed brand-neutral. No migration needed for DB. Env var rename (TELEMY_ to GOLIVEBRO_) marked optional with backwards-compat fallback reading both prefixes.

Gotchas & Known Issues

  • C++ plugin has zero test coverage. RF-009 (HIGH) identifies this. The JSON parsers are fragile hand-rolled character scanners, and the 2,817-line god-file has 10+ mutexes. Any refactoring carries regression risk without tests.
  • Stats thread data race is undefined behavior. RF-002 (HIGH) documents that build_stats_json() iterates shared vectors while the UDP event loop mutates them without synchronization. This can crash the relay process.
  • SRT port defaults disagree. The C++ client defaults to 9000, the SQL defaults to 9000, but the actual SRTLA ingest port is 5000. Only matters if session response is missing the field, but indicates copy-paste drift.
  • relayCache contradicts security hardening. The security hardening cycle stripped secrets from dock JS surfaces, but notifyDockActionResult re-introduces pair_token and ws_url into bridge state.
  • AWS provisioner removal invalidates several findings. SEC-02, SEC-08, DEBT-02 from the March audit are affected. The pool provisioner replaced ephemeral EC2, but ARC-01 (detached goroutine lifecycle) and ARC-05 (concurrent start race) still apply.
  • 1MB body limit was added. QLT-03 from the March audit noted weak request bounds. The April migration audit confirmed a 1MB body limit is now enforced at router.go:214.
  • AGPL licensing on srtla-fork. Running modified AGPL code as a network service requires source disclosure. This is addressed in the next-gen streaming plan (clean-room rewrite), not in the refactor audit remediation.

Open Questions

  • What is the remediation status of the 6 quick wins identified in Phase 1? Have any been completed?
  • Has the Jira epic been updated with completion status for any RF items?
  • Should the April 2026 migration audit findings (SEC-H1, SEC-H2, new tech debt items) be added to the existing Jira epic or tracked separately?
  • The Codex report originally rated SEC-02 (WSS endpoint) as CRITICAL. Opus downgraded to HIGH. With the AWS provisioner now removed, is this finding fully obsolete?
  • Should the env var prefix decision (keep TELEMY_ vs rename to GLB_ vs rename to GOLIVEBRO_) be made before or after the module path rename?

Sources