Transcription API

UG AI Engine for children

🧩 Transcription Real-Time Interaction API (WebSocket)

The UG Labs Interaction API allows you to stream audio and receive real-time transcription results via WebSocket.

💡

IMPORTANT: we support up to 30 seconds for each transcribe action. if you need more than that, you have to paginate your request and call transcribe on every 30 seconds or less of audio.

🔗 Live API (Staging)

Test and explore the API here →

👉 https://pug.stg.uglabs.app/docs#/

🔐 Authentication

Each connection requires a valid access token.

You can generate one following our Authentication Guide.

⚙️ Connection Flow

Connect to the WebSocket endpoint

Send an authenticate message with your access token

Stream audio in chunks via add_audio

Send a transcribe request

Receive the final transcription response

🌍 Endpoint

Staging:


wss://pug.stg.uglabs.app/interact

💬 Message Format

All client messages follow this format:


{
  "type": "request",
  "uid": "unique-client-id",
  "kind": "authenticate | add_audio | transcribe",
  "timestamp": "2025-10-05T12:00:00Z"
}

🐍 Example — Python


import base64
import json
import uuid
from datetime import datetime, timezone
from websocket import create_connection

AUDIO_FILE = "sample.mp3"
ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>"
URL = "wss://pug.stg.uglabs.app/interact"
LANGUAGE_CODE = "en"
UID = str(uuid.uuid4())
CHUNK_SIZE = 32000  # 32 KB

headers = [f"Authorization: Bearer {ACCESS_TOKEN}"]

def make_rpc(kind, **fields):
    return {
        "type": "request",
        "uid": UID,
        "kind": kind,
        "timestamp": datetime.now(timezone.utc).isoformat(),
        **fields,
    }

ws = create_connection(URL, header=headers)

# Authenticate
ws.send(json.dumps(make_rpc("authenticate", access_token=ACCESS_TOKEN)))
print("Auth:", json.loads(ws.recv()))

# Send audio chunks
with open(AUDIO_FILE, "rb") as f:
    while chunk := f.read(CHUNK_SIZE):
        ws.send(json.dumps(make_rpc(
            "add_audio",
            audio=base64.b64encode(chunk).decode(),
            config={"sampling_rate": 48000, "mime_type": "audio/mpeg"}
        )))
        print("Chunk:", json.loads(ws.recv()))

# Request transcription
transcribe_req = make_rpc("transcribe", language_code=LANGUAGE_CODE)
ws.send(json.dumps(transcribe_req))

# Receive transcription
while True:
    res = json.loads(ws.recv())
    if res.get("kind") == "transcribe":
        print("✅ Transcription:", res["text"])
        break

ws.close()

⚡ Example — JavaScript (Node.js)


import WebSocket from "ws";
import fs from "fs";
import { v4 as uuidv4 } from "uuid";

const AUDIO_FILE = "sample.mp3";
const ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>";
const URL = "wss://pug.stg.uglabs.app/interact";
const LANGUAGE_CODE = "en";
const UID = uuidv4();
const CHUNK_SIZE = 32000;

const ws = new WebSocket(URL, {
  headers: { Authorization: `Bearer ${ACCESS_TOKEN}` },
});

function makeRpc(kind, fields = {}) {
  return { type: "request", uid: UID, kind, timestamp: new Date().toISOString(), ...fields };
}

ws.on("open", () => {
  console.log("Connected");

  ws.send(JSON.stringify(makeRpc("authenticate", { access_token: ACCESS_TOKEN })));

  const buffer = fs.readFileSync(AUDIO_FILE);
  for (let i = 0; i < buffer.length; i += CHUNK_SIZE) {
    const chunk = buffer.subarray(i, i + CHUNK_SIZE);
    ws.send(JSON.stringify(makeRpc("add_audio", {
      audio: chunk.toString("base64"),
      config: { sampling_rate: 48000, mime_type: "audio/mpeg" },
    })));
  }

  ws.send(JSON.stringify(makeRpc("transcribe", { language_code: LANGUAGE_CODE })));
});

ws.on("message", (data) => {
  const msg = JSON.parse(data);
  if (msg.kind === "transcribe") {
    console.log("✅ Transcription:", msg.text);
    ws.close();
  }
});

ws.on("close", () => console.log("Connection closed"));

📦 Example Response

✅ Transcription Result


{
  "type": "response",
  "uid": "d1deb6ea-6b6a-4957-b59f-741bc70c5b8a",
  "kind": "transcribe",
  "client_start_time": null,
  "server_start_time": "2025-10-05T09:20:03.188700Z",
  "server_end_time": "2025-10-05T09:20:13.425967Z",
  "text": "the amazon rainforest, ..."
}

🧠 Notes

Audio chunk size: ≤ 32 KB per message

Format: MP3, OGG or WAV (audio/mpeg or audio/ogg or audio/wav)

Sample rate: 48kHz recommended

Order: Always authenticate → add_audio → transcribe

Response: Transcription is returned in the "text" field

🧩 Transcription Real-Time Interaction API (WebSocket)🔗 Live API (Staging)🔐 Authentication ⚙️ Connection Flow 🌍 Endpoint 💬 Message Format 🐍 Example — Python ⚡ Example — JavaScript (Node.js)📦 Example Response ✅ Transcription Result 🧠 Notes