← Back to Roadmaps

Building Real-Time Voice Agents from Scratch - Learning Roadmap | Nemorize

Loading roadmap...

Learning Topics

This roadmap covers the following topics:

✅ Part I: Foundations

✅ Shape of a Voice Agent
- ⚪ mic → ASR → LLM → TTS Loop
- ⚪ Trade Matrix
✅ Audio Fundamentals
- ⚪ SR_IN vs SR_OUT
- ⚪ float32 ↔ int16 Conversions
✅ VAD: Detecting Speech
- ⚪ Threshold Tuning
- ⚪ Pre-roll Buffer

✅ Part II: The Pipeline

⚪ ASR with faster-whisper
- ⚪ Model Size Trade-offs
- ⚪ ASR as a Blocking Call
⚪ LLM Streaming & State
- ⚪ Speakable System Prompt
- ⚪ The Commit Pattern
⚪ TTS & Latency Trick
- ⚪ pop_sentences Deep Dive
- ⚪ Kokoro vs Piper Backends

✅ Part III: The Hard Parts

⚪ Barge-in: Interruption
- ⚪ Yield-Point Latency
- ⚪ Cancel Wire Protocol
⚪ The Feedback Loop
- ⚪ Browser AEC
⚪ Playback State Machine
- ⚪ Three Distinct Moments

✅ Part IV: Engineering It Well

⚪ Frontend Audio Scheduling
- ⚪ AudioWorklet for Mic Capture
- ⚪ Gapless playHead Scheduling
⚪ Concurrency & Orchestration
- ⚪ run_in_executor Pattern
- ⚪ asyncio vs Threads — Same Shape

✅ Part V: Make It Yours

⚪ Capstone Extensions
- ⚪ Measurable Latency Fork
- ⚪ Extension Projects
⚪ The Production Bridge
- ⚪ Trade-offs You Now Own
- ⚪ Why Hosted APIs Choose as They Do

Community Feedback

Share your thoughts and rate this roadmap

Loading comments...

Generating content for:

Configure what to generate for this node.

Include lesson content

Generate educational text with explanations, examples, and diagrams

Number of questions:

Words per section:

Min: Max:

Words per section (max capped at 3000)

Number of sections:

Min: Max:

Number of sections in the lesson (2-10)

Lesson depth:

Auto: root nodes become surveys, deeper nodes teach in depth. Picking a preset updates the word inputs above; you can still adjust them.

Model:

Side-by-side code samples

Generate each code example in multiple languages, shown as tabs. Best with a codebase-grounded roadmap.

Languages:

Comma-separated fence ids (e.g. csharp, fsharp). Need 2+ to enable tabs.