Analysis Architecture
Summary
ToneForge v1 ships an 18-feature spaCy + textstat analyzer covering readability (Flesch-Kincaid), sentence structure, vocabulary, POS distribution, passive voice, pronouns, six punctuation rates, informality markers, and paragraph structure. Five rule-based dimension classifiers (classify_formality(), classify_brevity(), classify_confidence(), classify_warmth(), classify_technical_depth()) map the raw feature vector to human-readable labels using hardcoded thresholds. No ML is involved.
The v2 research plan (2026-04-13) proposes a two-track architecture. Track 1 stays free and local: enhanced spaCy expanding to 30-40 features plus StyleDistance embeddings (NAACL 2025, MIT, RoBERTa 0.1B, ~50ms CPU inference, ~400MB). Track 2 is a sponsor perk running in cloud: LLM-as-judge using Claude Sonnet, receiving Track 1’s output as structured context and evaluating qualitative dimensions that NLP cannot measure. Both tracks feed the voice substrate compiler. Cost for Track 2 is approximately $0.01 per analysis with prompt caching.
The strategic decision to open source the analyzer (made 2026-04-13, reversing the original proprietary plan from 2026-04-11) was driven by competitive research confirming that standard NLP features are not defensible IP. StyloMetrix, an open source library, replicates the same analysis. The real moat is the .toneprofile format achieving adoption and the voice substrate requiring persistent infrastructure to operate.
Timeline
- 2026-04-11: Original monetization design kept analyzer proprietary behind paid API. Package split designed to exclude
analyzer.pyfrom sdist. Hardcoded thresholds acknowledged as reverse-engineerable; compiled voice layer described as the defensibility play. - 2026-04-11: Monetization readiness scores from four-engine audit: Claude 7/10, Codex 4/10, Gemini 2/10 (framing divergence, factual agreement).
- 2026-04-13: v2 research plan drafted. Competitive analysis confirms no commercial writing style API exists (IBM Watson dead, Hume AI does emotion only). StyleDistance identified as state of the art for style similarity. Decision to open source the analyzer.
- 2026-04-13: Two-track architecture proposed. Track 1 (deterministic) free and local. Track 2 (generative) sponsor perk at ~$0.01/analysis. Fine-tuned Qwen 3.5 9B with DPO designated as v3 path pending labeled data.
Current State
v1 analyzer is live with 18 features. v2 architecture is a draft pending Brandon Metcalf review. Nothing in v2 is built yet.
Track 1 expansion requires: extending analyzer.py with 10 new extractors, integrating TextDescriptives (open source, spaCy-based) for dependency distance and semantic coherence, and adding StyleDistance as an optional dependency (pip install toneforge[embeddings]). Track 2 requires a cloud endpoint, Claude API integration, prompt caching setup, and sponsor-tier gating.
Classical stylometric features (v1 approach) are academically validated at 90.8% accuracy for authorship tasks. The spaCy foundation is not being discarded; it is being extended.
Key Decisions
- 2026-04-11: Analyzer kept proprietary, excluded from sdist — closed IP leak, enabled API monetization. Reversed by v2 plan.
- 2026-04-13: Open source the analyzer — standard NLP is not defensible IP. StyloMetrix replicates it. Open sourcing builds distribution for the real business (substrate, sync, LLM analysis).
- 2026-04-13: StyleDistance over custom embeddings — MIT licensed, 40 features, works today, state of the art for style similarity. Fine-tuning deferred to v3.
- 2026-04-13: Claude Sonnet as initial Track 2 LLM — strongest instruction following for structured extraction, prompt caching reduces cost, ~$0.01/analysis.
- 2026-04-13: Qwen 3.5 9B with DPO designated as v3 path — premature without labeled data from real users.
Open Questions
- StyleDistance vs. custom embedding model: trained on synthetic data, not real developer writing. If it underperforms on short-form chat, fine-tune it or let the LLM layer compensate?
- Claude Sonnet as analysis engine: selling “your voice, not the AI’s voice” while running analysis through Claude. Does that undermine trust? Would a fine-tuned open model read better as “our model, built for this”?
- Feature expansion scope: ship at launch with 30-40 features, or launch with v1’s 18 and expand based on user feedback?
- Developer-specific dimensions: code marker rate and list frequency assume a developer audience. Stay domain-agnostic for future flexibility, or lean into the developer niche?