💬

Transcription API

🧩 Transcription Real-Time Interaction API (WebSocket)

The UG Labs Interaction API allows you to stream audio and receive real-time transcription results via WebSocket.
 
💡
IMPORTANT: we support up to 30 seconds for each transcribe action. if you need more than that, you have to paginate your request and call transcribe on every 30 seconds or less of audio.

🔗 Live API (Staging)

Test and explore the API here →

🔐 Authentication

Each connection requires a valid access token.
You can generate one following our Authentication Guide.

⚙️ Connection Flow

  1. Connect to the WebSocket endpoint
  1. Send an authenticate message with your access token
  1. Stream audio in chunks via add_audio
  1. Send a transcribe request
  1. Receive the final transcription response

🌍 Endpoint

Staging:
wss://pug.stg.uglabs.app/interact

💬 Message Format

All client messages follow this format:
{ "type": "request", "uid": "unique-client-id", "kind": "authenticate | add_audio | transcribe", "timestamp": "2025-10-05T12:00:00Z" }

🐍 Example — Python

import base64 import json import uuid from datetime import datetime, timezone from websocket import create_connection AUDIO_FILE = "sample.mp3" ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>" URL = "wss://pug.stg.uglabs.app/interact" LANGUAGE_CODE = "en" UID = str(uuid.uuid4()) CHUNK_SIZE = 32000 # 32 KB headers = [f"Authorization: Bearer {ACCESS_TOKEN}"] def make_rpc(kind, **fields): return { "type": "request", "uid": UID, "kind": kind, "timestamp": datetime.now(timezone.utc).isoformat(), **fields, } ws = create_connection(URL, header=headers) # Authenticate ws.send(json.dumps(make_rpc("authenticate", access_token=ACCESS_TOKEN))) print("Auth:", json.loads(ws.recv())) # Send audio chunks with open(AUDIO_FILE, "rb") as f: while chunk := f.read(CHUNK_SIZE): ws.send(json.dumps(make_rpc( "add_audio", audio=base64.b64encode(chunk).decode(), config={"sampling_rate": 48000, "mime_type": "audio/mpeg"} ))) print("Chunk:", json.loads(ws.recv())) # Request transcription transcribe_req = make_rpc("transcribe", language_code=LANGUAGE_CODE) ws.send(json.dumps(transcribe_req)) # Receive transcription while True: res = json.loads(ws.recv()) if res.get("kind") == "transcribe": print("✅ Transcription:", res["text"]) break ws.close()

⚡ Example — JavaScript (Node.js)

import WebSocket from "ws"; import fs from "fs"; import { v4 as uuidv4 } from "uuid"; const AUDIO_FILE = "sample.mp3"; const ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>"; const URL = "wss://pug.stg.uglabs.app/interact"; const LANGUAGE_CODE = "en"; const UID = uuidv4(); const CHUNK_SIZE = 32000; const ws = new WebSocket(URL, { headers: { Authorization: `Bearer ${ACCESS_TOKEN}` }, }); function makeRpc(kind, fields = {}) { return { type: "request", uid: UID, kind, timestamp: new Date().toISOString(), ...fields }; } ws.on("open", () => { console.log("Connected"); ws.send(JSON.stringify(makeRpc("authenticate", { access_token: ACCESS_TOKEN }))); const buffer = fs.readFileSync(AUDIO_FILE); for (let i = 0; i < buffer.length; i += CHUNK_SIZE) { const chunk = buffer.subarray(i, i + CHUNK_SIZE); ws.send(JSON.stringify(makeRpc("add_audio", { audio: chunk.toString("base64"), config: { sampling_rate: 48000, mime_type: "audio/mpeg" }, }))); } ws.send(JSON.stringify(makeRpc("transcribe", { language_code: LANGUAGE_CODE }))); }); ws.on("message", (data) => { const msg = JSON.parse(data); if (msg.kind === "transcribe") { console.log("✅ Transcription:", msg.text); ws.close(); } }); ws.on("close", () => console.log("Connection closed"));

📦 Example Response

✅ Transcription Result

{ "type": "response", "uid": "d1deb6ea-6b6a-4957-b59f-741bc70c5b8a", "kind": "transcribe", "client_start_time": null, "server_start_time": "2025-10-05T09:20:03.188700Z", "server_end_time": "2025-10-05T09:20:13.425967Z", "text": "the amazon rainforest, ..." }

🧠 Notes

  • Audio chunk size: ≤ 32 KB per message
  • Format: MP3 or WAV (audio/mpeg or audio/wav)
  • Sample rate: 48kHz recommended
  • Order: Always authenticate → add_audio → transcribe
  • Response: Transcription is returned in the "text" field