DNS Provisioning Verification
1. Why this verification is needed
The pre-rebrand Cloudflare token was scoped to telemyapp.com. Every CreateOrUpdateRecord against golivebro.com returned 4xx that the pipeline only logged as dns_update_failed without failing the user-facing request. Token rotated 2026-04-17 (golivebro-api-dns-mgmt) and CLOUDFLARE_ZONE_ID on Advin repointed to the golivebro.com zone. No automated write has run under the new scope; the two existing records (kc1, lv1) were hand-created.
2. What success looks like
| Observable | How to verify |
|---|---|
| A record appears in golivebro.com zone within 5s of provision | GET /client/v4/zones/{zoneID}/dns_records?type=A&name={slug}.relay.golivebro.com returns one result whose content matches the relay IP |
| glb-api emits success log | docker logs glb-api -f on Advin shows dns: POST record <slug>.relay.golivebro.com -> <ip> (record_id=) (format from api/internal/dns/cloudflare.go:98) |
| Public resolution works | dig @1.1.1.1 <slug>.relay.golivebro.com +short returns the IP within 60s (TTL=60) |
| Deletion cleans up | After DELETE /api/v1/relay/self-hosted/{id}, Cloudflare returns zero results and logs emit dns: deleted record <fqdn> |
3. Three test approaches
A. Synthetic admin trigger. Not available. api/internal/api/router.go exposes no admin route that invokes the DNS client. /ops/users/{userID}/force-stop-relay only deprovisions.
B. Full OBS-plugin flow. Dashboard > Add Managed Relay calls POST /api/v1/relay/start with {"region_preference":"","connection_id":"<uuid>"}. The async pipeline at handlers.go:302-385 calls CreateOrUpdateRecord twice. Watch the session response for relay.hostname matching <slug>.relay.golivebro.com. Downside: failures are log-only, not surfaced to caller.
C. Dark-run via self-hosted register (recommended). POST /api/v1/relay/self-hosted/register at handlers_self_hosted.go:44-95 exercises CreateOrUpdateRecord synchronously and returns 500 on DNS failure. Request: {"port":5000,"external_ip":"<public_ip>","connection_id":"<uuid>"}. Response includes dns_slug and fqdn. Clean up with DELETE /api/v1/relay/self-hosted/{id}, which exercises DeleteRecord. Slugs are server-generated base36, so prefix namespacing requires code changes (see Open Questions).
Recommendation: C. Synchronous pass/fail, covers create and delete, no SRT infra required.
4. Alpha tester packet
Requires: a GoLiveBro account, a JWT from devtools (Network tab on any authed request), read-only CF dashboard access to the golivebro.com zone.
- Log into golivebro.com and copy your JWT.
curl -sX POST https://api.golivebro.com/api/v1/relay/self-hosted/register -H "Authorization: Bearer <jwt>" -H "Content-Type: application/json" -d '{"port":5000,"external_ip":"203.0.113.10","connection_id":"'"$(uuidgen)"'"}'- Capture response. Must contain
"fqdn": "<slug>.relay.golivebro.com"and"dns_slug". - Within 10s:
dig @1.1.1.1 <slug>.relay.golivebro.com +short. Expected:203.0.113.10. - In Cloudflare dashboard > golivebro.com > DNS, search
<slug>. Expected: one A record, content203.0.113.10, comment startsself-hosted |. curl -sX DELETE https://api.golivebro.com/api/v1/relay/self-hosted/<relay_id> -H "Authorization: Bearer <jwt>". Expected:{"ok":true,"id":"..."}.- Within 10s: second
digreturns empty; Cloudflare record gone.
Report back. Paste register response, first dig, CF DNS row screenshot, delete response, final dig. If anything diverged, include the failing call’s HTTP status.
Failure modes. Register 500 failed to create DNS record: token or zone id wrong. Register 200 but dig NXDOMAIN past 60s: record in wrong zone or never written. Delete 500: token lacks delete scope. Register 200 and dig works but dashboard empty: wrong zone.
5. Pass/fail decision matrix
| Outcome | Meaning | Action |
|---|---|---|
| All 7 green | DNS path works end-to-end | Log resolved |
| Register 200 + CF record + dig NXDOMAIN past TTL | Wrong zone or stale view | Compare Advin CLOUDFLARE_ZONE_ID to the golivebro.com zone id in CF URL |
| Register 200 + no CF record + no log | DNS client silently disabled | Verify CLOUDFLARE_DNS_TOKEN and CLOUDFLARE_ZONE_ID set in /opt/golivebro/.env, restart glb-api |
Register 500 + dns_update_failed 401/403 | Token invalid or wrong scope | Recreate token with Zone.DNS:Edit on golivebro.com only |
| Register 200 + delete 500 | Token missing delete verb | Recreate with full DNS edit |
6. Rollback
ssh advin "cd /opt/golivebro && cp .env.glb.bak .env && docker compose restart glb-api"Then in the CF dashboard, revoke golivebro-api-dns-mgmt (Profile > API Tokens > Roll or Delete) and recreate a short-lived token of the same scope. Do not resurrect the pre-rotation telemyapp.com-scoped token; it was known-broken.
Open Questions
- Should slug assignment support a reserved prefix (
alpha-,test-) so canary testers cannot collide with real slugs?AssignSlugIfMissingcurrently returns 8-char base36 fromcrypto/randwith no namespace. - Should
CreateOrUpdateRecordfailures in the async managed-relay pipeline (handlers.go:305,:378) promote from log-only to a session error so this silent class cannot recur? - Add an admin
/ops/dns/selftestroute that issues a no-op PUT against a reserved record to validate token scope at deploy time?