Transcription API

The UG Labs Interaction API allows you to stream audio and receive real-time transcription results via WebSocket.

Important

We support up to 30 seconds for each transcribe action. If you need more than that, you must paginate your request and call transcribe on every 30 seconds or less of audio.

Try it out

Test the Transcription API interactively using our STT Tester. Record audio from your microphone or upload an audio file to see transcription in action.

Authentication

Each connection requires a valid access token.

You can generate one following our Authentication Guide.

Connection Flow

Connect to the WebSocket endpoint
Send an authenticate message with your access token
Stream audio in chunks via add_audio
Send a transcribe request
Receive the final transcription response

Endpoint

Staging:

wss://pug.stg.uglabs.app/interact

Message Format

All client messages follow this format:

{
  "type": "request",
  "uid": "unique-client-id",
  "kind": "authenticate | add_audio | transcribe",
  "timestamp": "2025-10-05T12:00:00Z"
}

Example — Python

import base64
import json
import uuid
from datetime import datetime, timezone
from websocket import create_connection

AUDIO_FILE = "sample.mp3"
ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>"
URL = "wss://pug.stg.uglabs.app/interact"
LANGUAGE_CODE = "en"
UID = str(uuid.uuid4())
CHUNK_SIZE = 32000  # 32 KB

headers = [f"Authorization: Bearer {ACCESS_TOKEN}"]

def make_rpc(kind, **fields):
    return {
        "type": "request",
        "uid": UID,
        "kind": kind,
        "timestamp": datetime.now(timezone.utc).isoformat(),
        **fields,
    }

ws = create_connection(URL, header=headers)

# Authenticate
ws.send(json.dumps(make_rpc("authenticate", access_token=ACCESS_TOKEN)))
print("Auth:", json.loads(ws.recv()))

# Send audio chunks
with open(AUDIO_FILE, "rb") as f:
    while chunk := f.read(CHUNK_SIZE):
        ws.send(json.dumps(make_rpc(
            "add_audio",
            audio=base64.b64encode(chunk).decode(),
            config={"sampling_rate": 48000, "mime_type": "audio/mpeg"}
        )))
        print("Chunk:", json.loads(ws.recv()))

# Request transcription
transcribe_req = make_rpc("transcribe", language_code=LANGUAGE_CODE)
ws.send(json.dumps(transcribe_req))

# Receive transcription
while True:
    res = json.loads(ws.recv())
    if res.get("kind") == "transcribe":
        print("Transcription:", res["text"])
        break

ws.close()

Example — JavaScript (Node.js)

import WebSocket from "ws";
import fs from "fs";
import { v4 as uuidv4 } from "uuid";

const AUDIO_FILE = "sample.mp3";
const ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>";
const URL = "wss://pug.stg.uglabs.app/interact";
const LANGUAGE_CODE = "en";
const UID = uuidv4();
const CHUNK_SIZE = 32000;

const ws = new WebSocket(URL, {
  headers: { Authorization: `Bearer ${ACCESS_TOKEN}` },
});

function makeRpc(kind, fields = {}) {
  return { type: "request", uid: UID, kind, timestamp: new Date().toISOString(), ...fields };
}

ws.on("open", () => {
  console.log("Connected");

  ws.send(JSON.stringify(makeRpc("authenticate", { access_token: ACCESS_TOKEN })));

  const buffer = fs.readFileSync(AUDIO_FILE);
  for (let i = 0; i < buffer.length; i += CHUNK_SIZE) {
    const chunk = buffer.subarray(i, i + CHUNK_SIZE);
    ws.send(JSON.stringify(makeRpc("add_audio", {
      audio: chunk.toString("base64"),
      config: { sampling_rate: 48000, mime_type: "audio/mpeg" },
    })));
  }

  ws.send(JSON.stringify(makeRpc("transcribe", { language_code: LANGUAGE_CODE })));
});

ws.on("message", (data) => {
  const msg = JSON.parse(data);
  if (msg.kind === "transcribe") {
    console.log("Transcription:", msg.text);
    ws.close();
  }
});

ws.on("close", () => console.log("Connection closed"));

Example Response

Transcription Result

{
  "type": "response",
  "uid": "d1deb6ea-6b6a-4957-b59f-741bc70c5b8a",
  "kind": "transcribe",
  "client_start_time": null,
  "server_start_time": "2025-10-05T09:20:03.188700Z",
  "server_end_time": "2025-10-05T09:20:13.425967Z",
  "text": "the amazon rainforest, ..."
}

Notes

Audio chunk size: ≤ 32 KB per message
Format: MP3, OGG or WAV (audio/mpeg, audio/ogg, or audio/wav)
Sample rate: 48kHz recommended
Order: Always authenticate → add_audio → transcribe
Response: Transcription is returned in the "text" field

Authentication​

Connection Flow​

Endpoint​

Message Format​

Example — Python​

Example — JavaScript (Node.js)​

Example Response​

Transcription Result​

Notes​