Skip to main content

Transcription API

The UG Labs Interaction API allows you to stream audio and receive real-time transcription results via WebSocket.

Important

We support up to 30 seconds for each transcribe action. If you need more than that, you must paginate your request and call transcribe on every 30 seconds or less of audio.

Try it out

Test the Transcription API interactively using our STT Tester. Record audio from your microphone or upload an audio file to see transcription in action.

Authentication

Each connection requires a valid access token.

You can generate one following our Authentication Guide.

Connection Flow

  1. Connect to the WebSocket endpoint
  2. Send an authenticate message with your access token
  3. Stream audio in chunks via add_audio
  4. Send a transcribe request
  5. Receive the final transcription response

Endpoint

Staging:

wss://pug.stg.uglabs.app/interact

Message Format

All client messages follow this format:

{
"type": "request",
"uid": "unique-client-id",
"kind": "authenticate | add_audio | transcribe",
"timestamp": "2025-10-05T12:00:00Z"
}

Example — Python

import base64
import json
import uuid
from datetime import datetime, timezone
from websocket import create_connection

AUDIO_FILE = "sample.mp3"
ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>"
URL = "wss://pug.stg.uglabs.app/interact"
LANGUAGE_CODE = "en"
UID = str(uuid.uuid4())
CHUNK_SIZE = 32000 # 32 KB

headers = [f"Authorization: Bearer {ACCESS_TOKEN}"]

def make_rpc(kind, **fields):
return {
"type": "request",
"uid": UID,
"kind": kind,
"timestamp": datetime.now(timezone.utc).isoformat(),
**fields,
}

ws = create_connection(URL, header=headers)

# Authenticate
ws.send(json.dumps(make_rpc("authenticate", access_token=ACCESS_TOKEN)))
print("Auth:", json.loads(ws.recv()))

# Send audio chunks
with open(AUDIO_FILE, "rb") as f:
while chunk := f.read(CHUNK_SIZE):
ws.send(json.dumps(make_rpc(
"add_audio",
audio=base64.b64encode(chunk).decode(),
config={"sampling_rate": 48000, "mime_type": "audio/mpeg"}
)))
print("Chunk:", json.loads(ws.recv()))

# Request transcription
transcribe_req = make_rpc("transcribe", language_code=LANGUAGE_CODE)
ws.send(json.dumps(transcribe_req))

# Receive transcription
while True:
res = json.loads(ws.recv())
if res.get("kind") == "transcribe":
print("Transcription:", res["text"])
break

ws.close()

Example — JavaScript (Node.js)

import WebSocket from "ws";
import fs from "fs";
import { v4 as uuidv4 } from "uuid";

const AUDIO_FILE = "sample.mp3";
const ACCESS_TOKEN = "<YOUR_ACCESS_TOKEN>";
const URL = "wss://pug.stg.uglabs.app/interact";
const LANGUAGE_CODE = "en";
const UID = uuidv4();
const CHUNK_SIZE = 32000;

const ws = new WebSocket(URL, {
headers: { Authorization: `Bearer ${ACCESS_TOKEN}` },
});

function makeRpc(kind, fields = {}) {
return { type: "request", uid: UID, kind, timestamp: new Date().toISOString(), ...fields };
}

ws.on("open", () => {
console.log("Connected");

ws.send(JSON.stringify(makeRpc("authenticate", { access_token: ACCESS_TOKEN })));

const buffer = fs.readFileSync(AUDIO_FILE);
for (let i = 0; i < buffer.length; i += CHUNK_SIZE) {
const chunk = buffer.subarray(i, i + CHUNK_SIZE);
ws.send(JSON.stringify(makeRpc("add_audio", {
audio: chunk.toString("base64"),
config: { sampling_rate: 48000, mime_type: "audio/mpeg" },
})));
}

ws.send(JSON.stringify(makeRpc("transcribe", { language_code: LANGUAGE_CODE })));
});

ws.on("message", (data) => {
const msg = JSON.parse(data);
if (msg.kind === "transcribe") {
console.log("Transcription:", msg.text);
ws.close();
}
});

ws.on("close", () => console.log("Connection closed"));

Example Response

Transcription Result

{
"type": "response",
"uid": "d1deb6ea-6b6a-4957-b59f-741bc70c5b8a",
"kind": "transcribe",
"client_start_time": null,
"server_start_time": "2025-10-05T09:20:03.188700Z",
"server_end_time": "2025-10-05T09:20:13.425967Z",
"text": "the amazon rainforest, ..."
}

Notes

  • Audio chunk size: ≤ 32 KB per message
  • Format: MP3, OGG or WAV (audio/mpeg, audio/ogg, or audio/wav)
  • Sample rate: 48kHz recommended
  • Order: Always authenticate → add_audio → transcribe
  • Response: Transcription is returned in the "text" field