DNS Provisioning Verification

1. Why this verification is needed

The pre-rebrand Cloudflare token was scoped to telemyapp.com. Every CreateOrUpdateRecord against golivebro.com returned 4xx that the pipeline only logged as dns_update_failed without failing the user-facing request. Token rotated 2026-04-17 (golivebro-api-dns-mgmt) and CLOUDFLARE_ZONE_ID on Advin repointed to the golivebro.com zone. No automated write has run under the new scope; the two existing records (kc1, lv1) were hand-created.

2. What success looks like

ObservableHow to verify
A record appears in golivebro.com zone within 5s of provisionGET /client/v4/zones/{zoneID}/dns_records?type=A&name={slug}.relay.golivebro.com returns one result whose content matches the relay IP
glb-api emits success logdocker logs glb-api -f on Advin shows dns: POST record <slug>.relay.golivebro.com -> <ip> (record_id=) (format from api/internal/dns/cloudflare.go:98)
Public resolution worksdig @1.1.1.1 <slug>.relay.golivebro.com +short returns the IP within 60s (TTL=60)
Deletion cleans upAfter DELETE /api/v1/relay/self-hosted/{id}, Cloudflare returns zero results and logs emit dns: deleted record <fqdn>

3. Three test approaches

A. Synthetic admin trigger. Not available. api/internal/api/router.go exposes no admin route that invokes the DNS client. /ops/users/{userID}/force-stop-relay only deprovisions.

B. Full OBS-plugin flow. Dashboard > Add Managed Relay calls POST /api/v1/relay/start with {"region_preference":"","connection_id":"<uuid>"}. The async pipeline at handlers.go:302-385 calls CreateOrUpdateRecord twice. Watch the session response for relay.hostname matching <slug>.relay.golivebro.com. Downside: failures are log-only, not surfaced to caller.

C. Dark-run via self-hosted register (recommended). POST /api/v1/relay/self-hosted/register at handlers_self_hosted.go:44-95 exercises CreateOrUpdateRecord synchronously and returns 500 on DNS failure. Request: {"port":5000,"external_ip":"<public_ip>","connection_id":"<uuid>"}. Response includes dns_slug and fqdn. Clean up with DELETE /api/v1/relay/self-hosted/{id}, which exercises DeleteRecord. Slugs are server-generated base36, so prefix namespacing requires code changes (see Open Questions).

Recommendation: C. Synchronous pass/fail, covers create and delete, no SRT infra required.

4. Alpha tester packet

Requires: a GoLiveBro account, a JWT from devtools (Network tab on any authed request), read-only CF dashboard access to the golivebro.com zone.

  1. Log into golivebro.com and copy your JWT.
  2. curl -sX POST https://api.golivebro.com/api/v1/relay/self-hosted/register -H "Authorization: Bearer <jwt>" -H "Content-Type: application/json" -d '{"port":5000,"external_ip":"203.0.113.10","connection_id":"'"$(uuidgen)"'"}'
  3. Capture response. Must contain "fqdn": "<slug>.relay.golivebro.com" and "dns_slug".
  4. Within 10s: dig @1.1.1.1 <slug>.relay.golivebro.com +short. Expected: 203.0.113.10.
  5. In Cloudflare dashboard > golivebro.com > DNS, search <slug>. Expected: one A record, content 203.0.113.10, comment starts self-hosted | .
  6. curl -sX DELETE https://api.golivebro.com/api/v1/relay/self-hosted/<relay_id> -H "Authorization: Bearer <jwt>". Expected: {"ok":true,"id":"..."}.
  7. Within 10s: second dig returns empty; Cloudflare record gone.

Report back. Paste register response, first dig, CF DNS row screenshot, delete response, final dig. If anything diverged, include the failing call’s HTTP status.

Failure modes. Register 500 failed to create DNS record: token or zone id wrong. Register 200 but dig NXDOMAIN past 60s: record in wrong zone or never written. Delete 500: token lacks delete scope. Register 200 and dig works but dashboard empty: wrong zone.

5. Pass/fail decision matrix

OutcomeMeaningAction
All 7 greenDNS path works end-to-endLog resolved
Register 200 + CF record + dig NXDOMAIN past TTLWrong zone or stale viewCompare Advin CLOUDFLARE_ZONE_ID to the golivebro.com zone id in CF URL
Register 200 + no CF record + no logDNS client silently disabledVerify CLOUDFLARE_DNS_TOKEN and CLOUDFLARE_ZONE_ID set in /opt/golivebro/.env, restart glb-api
Register 500 + dns_update_failed 401/403Token invalid or wrong scopeRecreate token with Zone.DNS:Edit on golivebro.com only
Register 200 + delete 500Token missing delete verbRecreate with full DNS edit

6. Rollback

ssh advin "cd /opt/golivebro && cp .env.glb.bak .env && docker compose restart glb-api"

Then in the CF dashboard, revoke golivebro-api-dns-mgmt (Profile > API Tokens > Roll or Delete) and recreate a short-lived token of the same scope. Do not resurrect the pre-rotation telemyapp.com-scoped token; it was known-broken.

Open Questions

  • Should slug assignment support a reserved prefix (alpha-, test-) so canary testers cannot collide with real slugs? AssignSlugIfMissing currently returns 8-char base36 from crypto/rand with no namespace.
  • Should CreateOrUpdateRecord failures in the async managed-relay pipeline (handlers.go:305, :378) promote from log-only to a session error so this silent class cannot recur?
  • Add an admin /ops/dns/selftest route that issues a no-op PUT against a reserved record to validate token scope at deploy time?

Sources