← Back to Roadmaps
Building Real-Time Voice Agents from Scratch - Learning Roadmap | Nemorize
Loading roadmap...
Learning Topics
This roadmap covers the following topics:
✅ Part I: Foundations
- ✅ Shape of a Voice Agent
- ⚪ mic → ASR → LLM → TTS Loop
- ⚪ Trade Matrix
- ✅ Audio Fundamentals
- ⚪ SR_IN vs SR_OUT
- ⚪ float32 ↔ int16 Conversions
- ✅ VAD: Detecting Speech
- ⚪ Threshold Tuning
- ⚪ Pre-roll Buffer
✅ Part II: The Pipeline
- ⚪ ASR with faster-whisper
- ⚪ Model Size Trade-offs
- ⚪ ASR as a Blocking Call
- ⚪ LLM Streaming & State
- ⚪ Speakable System Prompt
- ⚪ The Commit Pattern
- ⚪ TTS & Latency Trick
- ⚪ pop_sentences Deep Dive
- ⚪ Kokoro vs Piper Backends
✅ Part III: The Hard Parts
- ⚪ Barge-in: Interruption
- ⚪ Yield-Point Latency
- ⚪ Cancel Wire Protocol
- ⚪ The Feedback Loop
- ⚪ Browser AEC
- ⚪ Playback State Machine
- ⚪ Three Distinct Moments
✅ Part IV: Engineering It Well
- ⚪ Frontend Audio Scheduling
- ⚪ AudioWorklet for Mic Capture
- ⚪ Gapless playHead Scheduling
- ⚪ Concurrency & Orchestration
- ⚪ run_in_executor Pattern
- ⚪ asyncio vs Threads — Same Shape
✅ Part V: Make It Yours
- ⚪ Capstone Extensions
- ⚪ Measurable Latency Fork
- ⚪ Extension Projects
- ⚪ The Production Bridge
- ⚪ Trade-offs You Now Own
- ⚪ Why Hosted APIs Choose as They Do
Sign in to share your feedback and rate this roadmap
Loading comments...
Community Feedback
Share your thoughts and rate this roadmap