UG AI Engine for children
🛠️ Architecture Deep Dive
🧞 Conversation Manager
This is the primary class you will interact with. It simplifies the complex process of managing a real-time conversation into a few simple methods.
- Key Responsibilities:
- Orchestrates the entire conversation flow.
- Manages state transitions (e.g., from
playing
audio tolistening
for the user). - Initializes the WebSocket connection and microphone access.
- Provides a high-level API:
initialize()
,pause()
,resume()
,sendText()
, etc.
- Configuration: It's instantiated with a
ConversationConfig
object, which is crucial for defining its behavior. Thehooks
property within this config is the primary way the SDK communicates back to your application UI.
🗣️ Conversation Network
This manager handles all low-level WebSocket communication.
- Key Responsibilities:
- Establishes, maintains, and closes the WebSocket connection.
- Handles authentication and initial configuration messages.
- Sends user input (audio/text) to the server.
- Receives assistant responses (audio, subtitles, metadata) and forwards them to the appropriate managers via events.
🎩 User Input Manager
This manager is responsible for capturing everything the user says or types.
- Key Responsibilities:
- Initializes and manages the
AudioRecorder
to get raw audio data from the microphone. - Uses a
VADManager
, powered by the industry-standard Silero VAD model, to detect speech with high accuracy, automatically starting and stopping the recording process. - Packages audio data and text into the correct format to be sent over the network.
- Implements a critical "barge-in" feature: when the assistant's audio playback is about to finish (within 1000ms), it proactively starts buffering the user's audio. This creates a seamless, responsive conversation by minimizing the delay between turns.
- Sub-components:
- 🎤
AudioRecorder
: Interfaces with the browser'sMediaRecorder
or anAudioWorklet
to capture audio chunks. - 🤫
VADManager
: Runs the lightweight Silero VAD model to determine if the user is speaking (Along with Server Side Smart Turn Detection Model)
🎪 Playback Manager
This manager handles the rendering of the assistant's response.
- Key Responsibilities:
- Receives messages from
Conversation Network
and directs them to the correct player. - Coordinates the synchronized playback of audio, subtitles, and avatar animations.
- Sub-components:
- 🎵
AudioPlayer.ts
: A robust audio player that handles chunked audio data, ensuring smooth, gapless playback of streamed audio. - 📜
SubtitleManager.ts
: Manages the display and timing of word-by-word or line-by-line subtitles. - 🧒
AvatarManager.ts
: Provides a simple API (playIdle
,playTalk
,playListen
) to control high-level avatar animations. It emits events that a UI component can listen to in order to drive the actual animation system (e.g., Spine, Rive, Three.js).