Download OpenAPI specification:Download
HTTP API for creating and polling AI jobs (transcription, translation, video analysis).
x-oidc-audience in /openapi.json).id.Operations related to converting audio or video content into structured, time-aligned text
| event required | string Enum: "transcription.progress" "transcription.completed" "transcription.failed" The lifecycle event that triggered this notification. | ||||||||||||||||||||||||||||||||||||
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
{- "event": "transcription.completed",
- "data": {
- "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
- "type": "transcription-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "video",
- "audio_track": 0
}, - "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}, - "result": {
- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}Submits an asynchronous speech-to-text job and returns 202 Accepted with a job resource.
status: pending).status is completed or failed, or configure
attributes.webhook to receive lifecycle callbacks (see Webhooks → transcription).data.attributes)| Field | Required | Description |
|---|---|---|
input |
Yes | Source media (type, url, optional audio_track). |
options |
No | Transcription settings: source language, timestamps, output format, diarization, queue priority. |
webhook |
No | HTTPS endpoint for transcription.progress, .completed, and .failed events. |
provider |
No | Backend to use (e.g. eden-ai). If omitted, the gateway selects a provider. See GET /v1/providers. |
meta.client |
No | Opaque key-value metadata echoed back in responses and webhooks. |
language — BCP 47 code of the spoken language (e.g. en, en-US, fr). Use auto only when
the chosen provider supports language detection. The default backend (Google via Eden AI) requires an
explicit code; omitting language may fall back to en-US in deployment configuration.timestamps — When true, the result includes time-aligned segments (start, end, text).format — Result serialization: json (structured segments), srt, or vtt (subtitle files).diarization — When true, speakers are distinguished in the transcript (provider-dependent).priority — Queue priority: low, standard (default), or high.The input.url must be reachable by the gateway and workers (HTTP/HTTPS). In local Compose stacks,
use the MinIO sample URL from the request example or your own publicly accessible file.
JSON:API-style envelope. Only data.attributes.input is required; all other fields are optional.
required | object JSON:API-style resource envelope. Must include | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
{- "data": {
- "type": "transcription-job",
- "attributes": {
- "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
},
}
}
}{- "data": {
- "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
- "type": "transcription-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "video",
- "audio_track": 0
}, - "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}, - "result": {
- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}Returns the current job resource, including status, progress (while processing), and result
(when status is completed). Use the job id from the 202 response to POST /v1/transcriptions.
| id required | string <uuid> Example: 2f41bc1f-b608-4360-acd9-a26a296fea3c Job UUID returned when the transcription was created. |
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
{- "data": {
- "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
- "type": "transcription-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "video",
- "audio_track": 0
}, - "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}, - "result": {
- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}Returns only the transcription output (format, language, segments, optional download_url).
Available when the job status is completed; otherwise responds with 404 (RESULT_NOT_READY).
Response content type depends on the requested options.format (application/json, text/srt, or text/vtt).
| id required | string <uuid> Example: 2f41bc1f-b608-4360-acd9-a26a296fea3c Job UUID returned when the transcription was created. |
| format | string Enum: "srt" "vtt" "json" Format of the transcription result, matching the requested output format. | ||||||
| duration | number Total duration of the media file in seconds. | ||||||
| language | string BCP 47 language code of the transcribed audio, as detected or specified. | ||||||
| download_url | string <uri> Pre-signed URL to download the full transcription file. Valid for a limited time. | ||||||
Array of objects (Segment) Time-aligned transcript segments. Present when | |||||||
Array
| |||||||
{- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}| event required | string Enum: "translation.progress" "translation.completed" "translation.failed" The lifecycle event that triggered this notification. | ||||||||||||||||||||||||||||||||||||
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
{- "event": "translation.completed",
- "data": {
- "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
- "type": "translation-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "result": {
- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}required | object JSON:API-style data envelope. | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
{- "data": {
- "type": "translation-job",
- "attributes": {
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "webhook": {
- "headers": {
- "property1": "string",
- "property2": "string"
}
}, - "provider": "twelvelabs"
}, - "meta": {
- "client": { }
}
}
}{- "data": {
- "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
- "type": "translation-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "result": {
- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| id required | string <uuid> Example: 9a1bc2f3-d405-4678-bcde-f12345678901 |
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
{- "data": {
- "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
- "type": "translation-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "result": {
- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| id required | string <uuid> Example: 9a1bc2f3-d405-4678-bcde-f12345678901 |
| source_language | string BCP 47 language code of the source content, as detected or specified. |
| target_language | string BCP 47 language code of the translated output. |
| content | string Translated text content. Present when input |
| download_url | string <uri> Pre-signed URL to download the translated file. Valid for a limited time. Present when input |
| character_count | integer Number of characters in the source content. |
{- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}| event required | string Enum: "video-analysis.progress" "video-analysis.completed" "video-analysis.failed" The lifecycle event that triggered this notification. | ||||||||||||||||||||||||||||||||||||
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
{- "event": "video-analysis.completed",
- "data": {
- "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
- "type": "video-analysis-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "result": {
- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}required | object JSON:API-style data envelope. | ||||||||||||||||||||||||
| |||||||||||||||||||||||||
{- "data": {
- "type": "video-analysis-job",
- "attributes": {
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "webhook": {
- "headers": {
- "property1": "string",
- "property2": "string"
}
}, - "provider": "twelvelabs"
}, - "meta": {
- "client": { }
}
}
}{- "data": {
- "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
- "type": "video-analysis-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "result": {
- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| id required | string <uuid> Example: 3e7dc4b2-91f0-4a1e-8c2d-b56789012345 |
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
{- "data": {
- "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
- "type": "video-analysis-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "result": {
- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| id required | string <uuid> Example: 3e7dc4b2-91f0-4a1e-8c2d-b56789012345 |
object (VideoMetadata) Technical properties of the processed video file. | |||||||||||||||||
| |||||||||||||||||
Array of objects (LabelDetection) Detected objects, scenes, and actions. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (SceneDetection) Scene and shot boundaries. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (FaceDetection) Faces detected and tracked across the video. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (TranscriptSegment) Speech-to-text segments with speaker identification. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (OcrText) On-screen text extracted from video frames. Present when | |||||||||||||||||
Array
| |||||||||||||||||
object (ContentModeration) Content moderation signals. Present when | |||||||||||||||||
| |||||||||||||||||
object (Sentiment) Overall tone and sentiment of the video. Present when | |||||||||||||||||
| |||||||||||||||||
Array of objects (Topic) Key topics and keywords extracted from the video. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (BrandDetection) Detected brand logos and visual trademarks. Present when | |||||||||||||||||
Array
| |||||||||||||||||
| summary | string Natural-language description of the video content. Present when | ||||||||||||||||
{- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}Returns all backend providers configured in the gateway, along with the resource types and features each one supports. Use this to determine which provider value to pass when creating a job, and which features are available for a given provider.
Array of objects | |||||||
Array
| |||||||
{- "data": [
- {
- "id": "twelvelabs",
- "type": "provider",
- "attributes": {
- "name": "TwelveLabs",
- "resources": {
- "transcriptions": {
- "features": [
- "language_detection",
- "diarization",
- "timestamps"
]
}, - "translations": {
- "features": [
- "language_detection",
- "formality"
]
}, - "video-analyses": {
- "features": [
- "labels",
- "scenes",
- "faces",
- "speech_to_text",
- "ocr",
- "content_moderation",
- "sentiment",
- "topics",
- "brands",
- "summary"
]
}
}
}
}
]
}A specific transcription capability that may or may not be supported by a given provider. language_detection: automatically detect the source language without explicit specification. diarization: identify and label distinct speakers in the transcript. timestamps: produce time-aligned transcript segments.
"language_detection"| features | Array of strings (Feature) Items Enum: "language_detection" "diarization" "timestamps" Subset of transcription features this provider can handle. |
{- "features": [
- "language_detection",
- "diarization",
- "timestamps"
]
}A specific translation capability that may or may not be supported by a given provider. language_detection: automatically detect the source language without explicit specification. formality: control the formality level (formal, informal) of the translated output.
"language_detection"| features | Array of strings (translation_Feature) Items Enum: "language_detection" "formality" Subset of translation features this provider can handle. |
{- "features": [
- "language_detection",
- "formality"
]
}A specific analysis capability to apply to the video. labels: detect objects, scenes, and actions throughout the video. scenes: detect scene and shot boundaries. faces: detect and track faces across frames. speech_to_text: convert speech to text, with speaker identification. ocr: extract on-screen text from video frames. content_moderation: flag explicit or inappropriate content. sentiment: analyse the overall tone and emotional valence. topics: extract key topics and keywords from audio and visual content. brands: detect brand logos and visual trademarks. summary: generate a natural-language description of the video content.
"labels"| features | Array of strings (video-analysis_Feature) Items Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary" Subset of video analysis features this provider can handle. |
{- "features": [
- "labels",
- "scenes",
- "faces",
- "speech_to_text",
- "ocr",
- "content_moderation",
- "sentiment",
- "topics",
- "brands",
- "summary"
]
}| name | string Human-readable name of the provider. | ||||||||||||||||||
object Capabilities offered by this provider per resource type. Only resource types supported by this provider appear in this object. | |||||||||||||||||||
| |||||||||||||||||||
{- "name": "TwelveLabs",
- "resources": {
- "transcriptions": {
- "features": [
- "language_detection",
- "diarization",
- "timestamps"
]
}, - "translations": {
- "features": [
- "language_detection",
- "formality"
]
}, - "video-analyses": {
- "features": [
- "labels",
- "scenes",
- "faces",
- "speech_to_text",
- "ocr",
- "content_moderation",
- "sentiment",
- "topics",
- "brands",
- "summary"
]
}
}
}Array of objects | |||||||
Array
| |||||||
{- "data": [
- {
- "id": "twelvelabs",
- "type": "provider",
- "attributes": {
- "name": "TwelveLabs",
- "resources": {
- "transcriptions": {
- "features": [
- "language_detection",
- "diarization",
- "timestamps"
]
}, - "translations": {
- "features": [
- "language_detection",
- "formality"
]
}, - "video-analyses": {
- "features": [
- "labels",
- "scenes",
- "faces",
- "speech_to_text",
- "ocr",
- "content_moderation",
- "sentiment",
- "topics",
- "brands",
- "summary"
]
}
}
}
}
]
}| type required | string Enum: "video" "audio" Media kind. |
| url required | string <uri> HTTP or HTTPS URL of the source file (e.g. MP4, WAV, MP3). Must be publicly accessible or reachable on the deployment network (e.g. MinIO in local Docker Compose). |
| audio_track | integer Zero-based index of the audio track to transcribe when |
{- "type": "video",
- "audio_track": 0
}| language | string BCP 47 language code of the speech in the source media (e.g. |
| timestamps | boolean Default: true When |
| format | string Enum: "srt" "vtt" "json" How the transcript is returned in |
| diarization | boolean Default: false When |
| priority | string Default: "standard" Enum: "low" "standard" "high" Relative queue priority for the job. |
{- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}| url required | string <uri> The endpoint the gateway will POST event payloads to. | ||
object Optional HTTP headers included in every webhook request. Typically used for authentication. | |||
| |||
{- "headers": {
- "property1": "string",
- "property2": "string"
}
}Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.
"twelvelabs"required | object (Input) Source audio or video file to transcribe. | ||||||||||
| |||||||||||
object (Options) Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details. | |||||||||||
| |||||||||||
object (Webhook) Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition ( | |||||||||||
| |||||||||||
| provider | string (Provider) Backend provider id for this job (e.g. | ||||||||||
{- "input": {
- "type": "video",
- "audio_track": 0
}, - "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}, - "provider": "twelvelabs"
}object Arbitrary key-value data provided by the client. Returned unchanged in all responses. | |||
| |||
object Internal metadata added by the gateway. Not exposed unless explicitly required. | |||
| |||
{- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}required | object JSON:API-style resource envelope. Must include | ||||||||||||||||||||||
| |||||||||||||||||||||||
{- "data": {
- "type": "transcription-job",
- "attributes": {
- "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
},
}
}
}Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.
"processing"| code | string Machine-readable error identifier. |
| message | string Human-readable explanation of the error. |
{- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}| provider | string (Provider) Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use | ||||
| status | string (JobStatus) Enum: "pending" "processing" "completed" "failed" Current lifecycle state of the job. | ||||
| progress | integer [ 0 .. 100 ] Processing progress as a percentage. Only meaningful while status is | ||||
object (Error) Details of a job failure. Only present when | |||||
| |||||
| created_at | string <date-time> When the job was created. | ||||
| processed_at | string <date-time> When the job transitioned from | ||||
| completed_at | string <date-time> When the job reached a terminal state ( | ||||
{- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z"
}| start | number Start time of the segment in seconds. |
| end | number End time of the segment in seconds. |
| text | string Transcribed text for this time range. |
{- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}| format | string Enum: "srt" "vtt" "json" Format of the transcription result, matching the requested output format. | ||||||
| duration | number Total duration of the media file in seconds. | ||||||
| language | string BCP 47 language code of the transcribed audio, as detected or specified. | ||||||
| download_url | string <uri> Pre-signed URL to download the full transcription file. Valid for a limited time. | ||||||
Array of objects (Segment) Time-aligned transcript segments. Present when | |||||||
Array
| |||||||
{- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}| provider | string Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use | ||||||||||||||||||
| status | string (JobStatus) Enum: "pending" "processing" "completed" "failed" Current lifecycle state of the job. | ||||||||||||||||||
| progress | integer [ 0 .. 100 ] Processing progress as a percentage. Only meaningful while status is | ||||||||||||||||||
object (Error) Details of a job failure. Only present when | |||||||||||||||||||
| |||||||||||||||||||
| created_at | string <date-time> When the job was created. | ||||||||||||||||||
| processed_at | string <date-time> When the job transitioned from | ||||||||||||||||||
| completed_at | string <date-time> When the job reached a terminal state ( | ||||||||||||||||||
required | object (Input) Source audio or video file to transcribe. | ||||||||||||||||||
| |||||||||||||||||||
object (Options) Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details. | |||||||||||||||||||
| |||||||||||||||||||
object (Webhook) Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition ( | |||||||||||||||||||
| |||||||||||||||||||
object (Result) Transcription output. Populated once status is | |||||||||||||||||||
| |||||||||||||||||||
{- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "video",
- "audio_track": 0
}, - "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}, - "result": {
- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}
}object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||
{- "data": {
- "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
- "type": "transcription-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "video",
- "audio_track": 0
}, - "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}, - "result": {
- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}object (Error) Details of a job failure. Only present when | |||||
| |||||
{- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}
}| type required | string Enum: "text" "document" Type of the source content. |
| content | string The text content to translate. Required when |
| url | string <uri> Publicly accessible URL of the document to translate. Required when |
| target_language required | string BCP 47 language code of the target language. |
{- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}| source_language | string BCP 47 language code of the source content. Use |
| formality | string Enum: "default" "formal" "informal" Formality level of the translated output. |
| priority | string Enum: "low" "standard" "high" Processing priority. Higher priority jobs are picked up sooner. |
{- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}required | object (translation_Input) Source content to translate. | ||||||||
| |||||||||
object (translation_Options) Optional settings controlling translation behaviour. | |||||||||
| |||||||||
object (Webhook) Destination configuration for job lifecycle notifications. | |||||||||
| |||||||||
| provider | string (Provider) Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use | ||||||||
{- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "provider": "twelvelabs"
}required | object JSON:API-style data envelope. | ||||||||||||||||||||||
| |||||||||||||||||||||||
{- "data": {
- "type": "translation-job",
- "attributes": {
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "provider": "twelvelabs"
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| source_language | string BCP 47 language code of the source content, as detected or specified. |
| target_language | string BCP 47 language code of the translated output. |
| content | string Translated text content. Present when input |
| download_url | string <uri> Pre-signed URL to download the translated file. Valid for a limited time. Present when input |
| character_count | integer Number of characters in the source content. |
{- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}| provider | string Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use | ||||||||||
| status | string (JobStatus) Enum: "pending" "processing" "completed" "failed" Current lifecycle state of the job. | ||||||||||
| progress | integer [ 0 .. 100 ] Processing progress as a percentage. Only meaningful while status is | ||||||||||
object (Error) Details of a job failure. Only present when | |||||||||||
| |||||||||||
| created_at | string <date-time> When the job was created. | ||||||||||
| processed_at | string <date-time> When the job transitioned from | ||||||||||
| completed_at | string <date-time> When the job reached a terminal state ( | ||||||||||
required | object (translation_Input) Source content to translate. | ||||||||||
| |||||||||||
object (translation_Options) Optional settings controlling translation behaviour. | |||||||||||
| |||||||||||
object (Webhook) Destination configuration for job lifecycle notifications. | |||||||||||
| |||||||||||
object (translation_Result) Translation output. Populated once status is | |||||||||||
| |||||||||||
{- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "result": {
- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}
}object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||
{- "data": {
- "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
- "type": "translation-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "result": {
- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| url required | string <uri> Publicly accessible URL of the video file. |
| features required | Array of strings (video-analysis_Feature) non-empty Items Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary" One or more analysis capabilities to apply. At least one feature must be specified. |
| audio_track | integer Index of the audio track to use for speech-related features. Defaults to the first track when omitted. |
{- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}| language | string BCP 47 language code for speech and text features. Use |
| confidence_threshold | number [ 0 .. 1 ] Minimum confidence score (0–1) for a detection to be included in the result. Defaults to 0.5. |
| priority | string Enum: "low" "standard" "high" Processing priority. Higher priority jobs are picked up sooner. |
{- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}required | object (video-analysis_Input) Source video and the list of analysis features to run. | ||||||||
| |||||||||
object (video-analysis_Options) Optional settings controlling analysis behaviour. | |||||||||
| |||||||||
object (Webhook) Destination configuration for job lifecycle notifications. | |||||||||
| |||||||||
| provider | string (Provider) Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use | ||||||||
{- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "provider": "twelvelabs"
}required | object JSON:API-style data envelope. | ||||||||||||||||||||||
| |||||||||||||||||||||||
{- "data": {
- "type": "video-analysis-job",
- "attributes": {
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "provider": "twelvelabs"
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| duration | number Total duration of the video in seconds. |
| width | integer Video width in pixels. |
| height | integer Video height in pixels. |
| frame_rate | number Frames per second of the video. |
| format | string Container format of the video file. |
| codec | string Video codec used for encoding. |
{- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}| start | number Start time of the occurrence in seconds. |
| end | number End time of the occurrence in seconds. |
| confidence | number [ 0 .. 1 ] Confidence score for this specific occurrence. |
{- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}| name | string Human-readable name of the detected label. | ||||||
| confidence | number [ 0 .. 1 ] Overall confidence score for this label across the video. | ||||||
Array of objects (TimedInstance) Time ranges in which this label was detected. | |||||||
Array
| |||||||
{- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}| index | integer Zero-based position of this scene in the video. |
| start | number Start time of the scene in seconds. |
| end | number End time of the scene in seconds. |
{- "index": 3,
- "start": 42,
- "end": 78.5
}| track_id | integer Integer identifier grouping all appearances of the same face within this video. | ||||||
| fingerprint | string Base64-encoded face embedding vector produced by the underlying model. When present, fingerprints from different videos can be compared for similarity to determine whether the same person appears across videos. Fingerprints are only comparable when produced by the same backend model — cross-model comparison is not meaningful. Not all backends populate this field. | ||||||
Array of objects (TimedInstance) Time ranges in which this face is visible. | |||||||
Array
| |||||||
{- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}| start | number Start time of the segment in seconds. |
| end | number End time of the segment in seconds. |
| text | string Transcribed speech for this time range. |
| speaker_id | integer Integer identifier grouping segments from the same speaker. |
| language | string BCP 47 language code detected for this segment. |
| confidence | number [ 0 .. 1 ] Confidence score for this transcript segment. |
{- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}| text | string The detected text string. | ||||||
| confidence | number [ 0 .. 1 ] Confidence score for this text detection. | ||||||
| language | string BCP 47 language code of the detected text. | ||||||
Array of objects (TimedInstance) Time ranges in which this text is visible on screen. | |||||||
Array
| |||||||
{- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}| label | string Enum: "explicit_nudity" "suggestive" "violence" "visually_disturbing" "hate_symbols" "tobacco" "alcohol" "gambling" Machine-readable label identifying the type of flagged content. | ||||||
| confidence | number [ 0 .. 1 ] Overall confidence score for this signal across the video. | ||||||
Array of objects (TimedInstance) Time ranges in which this signal was detected. | |||||||
Array
| |||||||
{- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}| is_safe | boolean Whether the video passed moderation at the requested confidence threshold. | ||||||
Array of objects (ModerationSignal) Individual moderation signals detected above the confidence threshold. | |||||||
Array
| |||||||
{- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}| start | number Start time of the segment in seconds. |
| end | number End time of the segment in seconds. |
| label | string Enum: "positive" "neutral" "negative" Sentiment label for this time range. |
| score | number [ -1 .. 1 ] Sentiment score for this time range. |
{- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}| overall | string Enum: "positive" "neutral" "negative" Dominant sentiment across the entire video. | ||||||||
| score | number [ -1 .. 1 ] Aggregate sentiment score from -1 (most negative) to 1 (most positive). | ||||||||
Array of objects (SentimentInstance) Sentiment variations across the video timeline. | |||||||||
Array
| |||||||||
{- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}| name | string Topic or keyword name. | ||||||
| confidence | number [ 0 .. 1 ] Confidence score for this topic. | ||||||
Array of objects (TimedInstance) Time ranges in which this topic is relevant. | |||||||
Array
| |||||||
{- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}| name | string Name of the detected brand. | ||||||
| confidence | number [ 0 .. 1 ] Overall confidence score for this brand detection. | ||||||
Array of objects (TimedInstance) Time ranges in which this brand is visible on screen. | |||||||
Array
| |||||||
{- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}object (VideoMetadata) Technical properties of the processed video file. | |||||||||||||||||
| |||||||||||||||||
Array of objects (LabelDetection) Detected objects, scenes, and actions. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (SceneDetection) Scene and shot boundaries. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (FaceDetection) Faces detected and tracked across the video. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (TranscriptSegment) Speech-to-text segments with speaker identification. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (OcrText) On-screen text extracted from video frames. Present when | |||||||||||||||||
Array
| |||||||||||||||||
object (ContentModeration) Content moderation signals. Present when | |||||||||||||||||
| |||||||||||||||||
object (Sentiment) Overall tone and sentiment of the video. Present when | |||||||||||||||||
| |||||||||||||||||
Array of objects (Topic) Key topics and keywords extracted from the video. Present when | |||||||||||||||||
Array
| |||||||||||||||||
Array of objects (BrandDetection) Detected brand logos and visual trademarks. Present when | |||||||||||||||||
Array
| |||||||||||||||||
| summary | string Natural-language description of the video content. Present when | ||||||||||||||||
{- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}| provider | string Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| status | string (JobStatus) Enum: "pending" "processing" "completed" "failed" Current lifecycle state of the job. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| progress | integer [ 0 .. 100 ] Processing progress as a percentage. Only meaningful while status is | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object (Error) Details of a job failure. Only present when | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| created_at | string <date-time> When the job was created. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| processed_at | string <date-time> When the job transitioned from | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| completed_at | string <date-time> When the job reached a terminal state ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
required | object (video-analysis_Input) Source video and the list of analysis features to run. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object (video-analysis_Options) Optional settings controlling analysis behaviour. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object (Webhook) Destination configuration for job lifecycle notifications. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object (video-analysis_Result) Video analysis output. Populated once status is | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
{- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "result": {
- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}
}object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||
{- "data": {
- "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
- "type": "video-analysis-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "result": {
- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| event required | string Enum: "transcription.progress" "transcription.completed" "transcription.failed" The lifecycle event that triggered this notification. | ||||||||||||||||||||||||||||||||||||||
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||
{- "event": "transcription.completed",
- "data": {
- "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
- "type": "transcription-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "video",
- "audio_track": 0
}, - "options": {
- "language": "en",
- "timestamps": true,
- "format": "json",
- "diarization": false,
- "priority": "standard"
}, - "result": {
- "format": "srt",
- "duration": 183.4,
- "language": "en",
- "segments": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's interview."
}
]
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| event required | string Enum: "translation.progress" "translation.completed" "translation.failed" The lifecycle event that triggered this notification. | ||||||||||||||||||||||||||||||||||||||
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||
{- "event": "translation.completed",
- "data": {
- "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
- "type": "translation-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "type": "text",
- "content": "Hello, how are you today?",
- "target_language": "fr"
}, - "options": {
- "source_language": "auto",
- "formality": "formal",
- "priority": "standard"
}, - "result": {
- "source_language": "en",
- "target_language": "fr",
- "content": "Bonjour, comment allez-vous aujourd'hui ?",
- "character_count": 1250
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}| event required | string Enum: "video-analysis.progress" "video-analysis.completed" "video-analysis.failed" The lifecycle event that triggered this notification. | ||||||||||||||||||||||||||||||||||||||
object JSON:API-style data envelope. | |||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||
{- "event": "video-analysis.completed",
- "data": {
- "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
- "type": "video-analysis-job",
- "attributes": {
- "provider": "twelvelabs",
- "status": "processing",
- "progress": 72,
- "error": {
- "code": "AUDIO_UNREADABLE",
- "message": "Could not extract audio from the provided file."
}, - "created_at": "2024-03-15T10:00:00Z",
- "processed_at": "2024-03-15T10:00:05Z",
- "completed_at": "2024-03-15T10:02:30Z",
- "input": {
- "features": [
- "labels",
- "scenes",
- "speech_to_text",
- "summary"
], - "audio_track": 0
}, - "options": {
- "language": "auto",
- "confidence_threshold": 0.7,
- "priority": "standard"
}, - "result": {
- "video_metadata": {
- "duration": 3742.5,
- "width": 1920,
- "height": 1080,
- "frame_rate": 29.97,
- "format": "mp4",
- "codec": "h264"
}, - "labels": [
- {
- "name": "Conference room",
- "confidence": 0.94,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "scenes": [
- {
- "index": 3,
- "start": 42,
- "end": 78.5
}
], - "faces": [
- {
- "track_id": 1,
- "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "speech_to_text": [
- {
- "start": 12.5,
- "end": 15.8,
- "text": "Welcome to today's panel discussion on AI safety.",
- "speaker_id": 0,
- "language": "en",
- "confidence": 0.97
}
], - "ocr": [
- {
- "text": "Q3 Revenue: $4.2M",
- "confidence": 0.91,
- "language": "en",
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "content_moderation": {
- "is_safe": true,
- "signals": [
- {
- "label": "violence",
- "confidence": 0.82,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
]
}, - "sentiment": {
- "overall": "positive",
- "score": 0.62,
- "instances": [
- {
- "start": 0,
- "end": 45,
- "label": "positive",
- "score": 0.71
}
]
}, - "topics": [
- {
- "name": "artificial intelligence",
- "confidence": 0.95,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "brands": [
- {
- "name": "Acme Corp",
- "confidence": 0.88,
- "instances": [
- {
- "start": 12.5,
- "end": 15.8,
- "confidence": 0.91
}
]
}
], - "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}
}, - "meta": {
- "client": { },
- "system": {
- "region": "eu-west-1",
- "worker_id": "wk_789"
}
}
}
}