DNS Provisioning Verification

1. Why this verification is needed

The pre-rebrand Cloudflare token was scoped to telemyapp.com. Every CreateOrUpdateRecord against golivebro.com returned 4xx that the pipeline only logged as dns_update_failed without failing the user-facing request. Token rotated 2026-04-17 (golivebro-api-dns-mgmt) and CLOUDFLARE_ZONE_ID on Advin repointed to the golivebro.com zone. No automated write has run under the new scope; the two existing records (kc1, lv1) were hand-created.

2. What success looks like

Observable	How to verify
A record appears in golivebro.com zone within 5s of provision	`GET /client/v4/zones/{zoneID}/dns_records?type=A&name={slug}.relay.golivebro.com` returns one result whose `content` matches the relay IP
glb-api emits success log	`docker logs glb-api -f` on Advin shows `dns: POST record <slug>.relay.golivebro.com -> <ip> (record_id=)` (format from `api/internal/dns/cloudflare.go:98`)
Public resolution works	`dig @1.1.1.1 <slug>.relay.golivebro.com +short` returns the IP within 60s (TTL=60)
Deletion cleans up	After `DELETE /api/v1/relay/self-hosted/{id}`, Cloudflare returns zero results and logs emit `dns: deleted record <fqdn>`

3. Three test approaches

A. Synthetic admin trigger. Not available. api/internal/api/router.go exposes no admin route that invokes the DNS client. /ops/users/{userID}/force-stop-relay only deprovisions.

B. Full OBS-plugin flow. Dashboard > Add Managed Relay calls POST /api/v1/relay/start with {"region_preference":"","connection_id":"<uuid>"}. The async pipeline at handlers.go:302-385 calls CreateOrUpdateRecord twice. Watch the session response for relay.hostname matching <slug>.relay.golivebro.com. Downside: failures are log-only, not surfaced to caller.

C. Dark-run via self-hosted register (recommended). POST /api/v1/relay/self-hosted/register at handlers_self_hosted.go:44-95 exercises CreateOrUpdateRecord synchronously and returns 500 on DNS failure. Request: {"port":5000,"external_ip":"<public_ip>","connection_id":"<uuid>"}. Response includes dns_slug and fqdn. Clean up with DELETE /api/v1/relay/self-hosted/{id}, which exercises DeleteRecord. Slugs are server-generated base36, so prefix namespacing requires code changes (see Open Questions).

Recommendation: C. Synchronous pass/fail, covers create and delete, no SRT infra required.

4. Alpha tester packet

Requires: a GoLiveBro account, a JWT from devtools (Network tab on any authed request), read-only CF dashboard access to the golivebro.com zone.

Log into golivebro.com and copy your JWT.
curl -sX POST https://api.golivebro.com/api/v1/relay/self-hosted/register -H "Authorization: Bearer <jwt>" -H "Content-Type: application/json" -d '{"port":5000,"external_ip":"203.0.113.10","connection_id":"'"$(uuidgen)"'"}'
Capture response. Must contain "fqdn": "<slug>.relay.golivebro.com" and "dns_slug".
Within 10s: dig @1.1.1.1 <slug>.relay.golivebro.com +short. Expected: 203.0.113.10.
In Cloudflare dashboard > golivebro.com > DNS, search <slug>. Expected: one A record, content 203.0.113.10, comment starts self-hosted | .
curl -sX DELETE https://api.golivebro.com/api/v1/relay/self-hosted/<relay_id> -H "Authorization: Bearer <jwt>". Expected: {"ok":true,"id":"..."}.
Within 10s: second dig returns empty; Cloudflare record gone.

Report back. Paste register response, first dig, CF DNS row screenshot, delete response, final dig. If anything diverged, include the failing call’s HTTP status.

Failure modes. Register 500 failed to create DNS record: token or zone id wrong. Register 200 but dig NXDOMAIN past 60s: record in wrong zone or never written. Delete 500: token lacks delete scope. Register 200 and dig works but dashboard empty: wrong zone.

5. Pass/fail decision matrix

Outcome	Meaning	Action
All 7 green	DNS path works end-to-end	Log resolved
Register 200 + CF record + dig NXDOMAIN past TTL	Wrong zone or stale view	Compare Advin `CLOUDFLARE_ZONE_ID` to the golivebro.com zone id in CF URL
Register 200 + no CF record + no log	DNS client silently disabled	Verify `CLOUDFLARE_DNS_TOKEN` and `CLOUDFLARE_ZONE_ID` set in `/opt/golivebro/.env`, restart glb-api
Register 500 + `dns_update_failed` 401/403	Token invalid or wrong scope	Recreate token with `Zone.DNS:Edit` on golivebro.com only
Register 200 + delete 500	Token missing delete verb	Recreate with full DNS edit

6. Rollback

ssh advin "cd /opt/golivebro && cp .env.glb.bak .env && docker compose restart glb-api"

Then in the CF dashboard, revoke golivebro-api-dns-mgmt (Profile > API Tokens > Roll or Delete) and recreate a short-lived token of the same scope. Do not resurrect the pre-rotation telemyapp.com-scoped token; it was known-broken.

Open Questions

Should slug assignment support a reserved prefix (alpha-, test-) so canary testers cannot collide with real slugs? AssignSlugIfMissing currently returns 8-char base36 from crypto/rand with no namespace.
Should CreateOrUpdateRecord failures in the async managed-relay pipeline (handlers.go:305, :378) promote from log-only to a session error so this silent class cannot recur?
Add an admin /ops/dns/selftest route that issues a no-op PUT against a reserved record to validate token scope at deploy time?

Pentz Knowledge Base

Explorer

Graph View