AI Gateway (dev)

Download OpenAPI specification:Download

HTTP API for creating and polling AI jobs (transcription, translation, video analysis).

Local testing (Swagger UI)

  1. Open Swagger UI (default http://localhost:8090/swagger/).
  2. Authorizeoauth2ClientCredentialsAuthorize (token URL points to Auth0; audience from x-oidc-audience in /openapi.json).
  3. Call POST /v1/transcriptions (or translations / video analyses), then poll GET by job id.

Transcript

Operations related to converting audio or video content into structured, time-aligned text

Transcription job lifecycle event Webhook

Authorizations:
oauth2ClientCredentials
Request Body schema: application/json
required
event
required
string
Enum: "transcription.progress" "transcription.completed" "transcription.failed"

The lifecycle event that triggered this notification. transcription.progress: the job is processing; data.attributes.progress is updated. transcription.completed: the job finished successfully; data.attributes.result is populated. transcription.failed: the job encountered an unrecoverable error; data.attributes.error is populated.

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the transcription job.

type
string
Value: "transcription-job"

Resource type identifier. Always transcription-job.

object (transcription_Attributes)

Full attributes of a transcription job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (Input)

Source audio or video file to transcribe.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

object (Result)

Transcription output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

Responses

Request samples

Content type
application/json
{
  • "event": "transcription.completed",
  • "data": {
    • "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
    • "type": "transcription-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "en",
        • "timestamps": true,
        • "format": "json",
        • "diarization": false,
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Create transcription job

Submits an asynchronous speech-to-text job and returns 202 Accepted with a job resource.

How it works

  1. The gateway validates the request and stores a job (status: pending).
  2. For video input, audio is extracted first, then sent to the transcription backend.
  3. For audio input, transcription starts immediately.
  4. Poll GET /v1/transcriptions/{id} until status is completed or failed, or configure attributes.webhook to receive lifecycle callbacks (see Webhooks → transcription).

Request body (data.attributes)

Field Required Description
input Yes Source media (type, url, optional audio_track).
options No Transcription settings: source language, timestamps, output format, diarization, queue priority.
webhook No HTTPS endpoint for transcription.progress, .completed, and .failed events.
provider No Backend to use (e.g. eden-ai). If omitted, the gateway selects a provider. See GET /v1/providers.
meta.client No Opaque key-value metadata echoed back in responses and webhooks.

Options reference

  • language — BCP 47 code of the spoken language (e.g. en, en-US, fr). Use auto only when the chosen provider supports language detection. The default backend (Google via Eden AI) requires an explicit code; omitting language may fall back to en-US in deployment configuration.
  • timestamps — When true, the result includes time-aligned segments (start, end, text).
  • format — Result serialization: json (structured segments), srt, or vtt (subtitle files).
  • diarization — When true, speakers are distinguished in the transcript (provider-dependent).
  • priority — Queue priority: low, standard (default), or high.

Input URL

The input.url must be reachable by the gateway and workers (HTTP/HTTPS). In local Compose stacks, use the MinIO sample URL from the request example or your own publicly accessible file.

Authorizations:
oauth2ClientCredentials
Request Body schema: application/json
required

JSON:API-style envelope. Only data.attributes.input is required; all other fields are optional.

required
object

JSON:API-style resource envelope. Must include type and attributes; meta is optional.

type
required
string

Resource type discriminator. Must be exactly transcription-job for this endpoint.

required
object (AttributesCreate)

Parameters that control what is transcribed and how. Only input is required; configure options, webhook, and provider as needed.

required
object (Input)

Source audio or video file to transcribe.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

object (Webhook)

Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition (transcription.progress, transcription.completed, transcription.failed).

provider
string (Provider)

Backend provider id for this job (e.g. eden-ai). When omitted, the gateway picks a provider that supports the requested options. List providers and features with GET /v1/providers.

object (Meta)

Optional client metadata (meta.client). Stored with the job and returned unchanged in GET responses and webhook payloads. Not used for routing or processing.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

property name*
additional property
any

Responses

Request samples

Content type
application/json
{}

Response samples

Content type
application/json
{
  • "data": {
    • "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
    • "type": "transcription-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "en",
        • "timestamps": true,
        • "format": "json",
        • "diarization": false,
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Get transcription job

Returns the current job resource, including status, progress (while processing), and result (when status is completed). Use the job id from the 202 response to POST /v1/transcriptions.

Authorizations:
oauth2ClientCredentials
path Parameters
id
required
string <uuid>
Example: 2f41bc1f-b608-4360-acd9-a26a296fea3c

Job UUID returned when the transcription was created.

Responses

Response Schema: application/json
object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the transcription job.

type
string
Value: "transcription-job"

Resource type identifier. Always transcription-job.

object (transcription_Attributes)

Full attributes of a transcription job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (Input)

Source audio or video file to transcribe.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

object (Result)

Transcription output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

Response samples

Content type
application/json
{
  • "data": {
    • "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
    • "type": "transcription-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "en",
        • "timestamps": true,
        • "format": "json",
        • "diarization": false,
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Get transcription result

Returns only the transcription output (format, language, segments, optional download_url). Available when the job status is completed; otherwise responds with 404 (RESULT_NOT_READY).

Response content type depends on the requested options.format (application/json, text/srt, or text/vtt).

Authorizations:
oauth2ClientCredentials
path Parameters
id
required
string <uuid>
Example: 2f41bc1f-b608-4360-acd9-a26a296fea3c

Job UUID returned when the transcription was created.

Responses

Response Schema:
format
string
Enum: "srt" "vtt" "json"

Format of the transcription result, matching the requested output format.

duration
number

Total duration of the media file in seconds.

language
string

BCP 47 language code of the transcribed audio, as detected or specified.

download_url
string <uri>

Pre-signed URL to download the full transcription file. Valid for a limited time.

Array of objects (Segment)

Time-aligned transcript segments. Present when timestamps was enabled.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed text for this time range.

Response samples

Content type
{}

Translation

Operations related to translating text or document content between languages

Translation job lifecycle event Webhook

Authorizations:
oauth2ClientCredentials
Request Body schema: application/json
required
event
required
string
Enum: "translation.progress" "translation.completed" "translation.failed"

The lifecycle event that triggered this notification. translation.progress: the job is processing; data.attributes.progress is updated. translation.completed: the job finished successfully; data.attributes.result is populated. translation.failed: the job encountered an unrecoverable error; data.attributes.error is populated.

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the translation job.

type
string
Value: "translation-job"

Resource type identifier. Always translation-job.

object (translation_Attributes)

Full attributes of a translation job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (translation_Input)

Source content to translate.

object (translation_Options)

Optional settings controlling translation behaviour.

object (translation_Result)

Translation output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

Responses

Request samples

Content type
application/json
{
  • "event": "translation.completed",
  • "data": {
    • "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
    • "type": "translation-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "source_language": "auto",
        • "formality": "formal",
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Create translation job

Authorizations:
oauth2ClientCredentials
Request Body schema: application/json
required
required
object

JSON:API-style data envelope.

type
required
string

Resource type identifier. Must be translation-job.

required
object (translation_AttributesCreate)

Input fields required to create a translation job.

required
object (translation_Input)

Source content to translate.

object (translation_Options)

Optional settings controlling translation behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

provider
string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

property name*
additional property
any

Responses

Request samples

Content type
application/json
{}

Response samples

Content type
application/json
{
  • "data": {
    • "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
    • "type": "translation-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "source_language": "auto",
        • "formality": "formal",
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Get translation job

Authorizations:
oauth2ClientCredentials
path Parameters
id
required
string <uuid>
Example: 9a1bc2f3-d405-4678-bcde-f12345678901

Responses

Response Schema: application/json
object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the translation job.

type
string
Value: "translation-job"

Resource type identifier. Always translation-job.

object (translation_Attributes)

Full attributes of a translation job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (translation_Input)

Source content to translate.

object (translation_Options)

Optional settings controlling translation behaviour.

object (translation_Result)

Translation output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

Response samples

Content type
application/json
{
  • "data": {
    • "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
    • "type": "translation-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "source_language": "auto",
        • "formality": "formal",
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Get translation result

Authorizations:
oauth2ClientCredentials
path Parameters
id
required
string <uuid>
Example: 9a1bc2f3-d405-4678-bcde-f12345678901

Responses

Response Schema:
source_language
string

BCP 47 language code of the source content, as detected or specified.

target_language
string

BCP 47 language code of the translated output.

content
string

Translated text content. Present when input type is text.

download_url
string <uri>

Pre-signed URL to download the translated file. Valid for a limited time. Present when input type is document.

character_count
integer

Number of characters in the source content.

Response samples

Content type
{}

Video Analysis

Operations related to extracting structured metadata and insights from video content

Video analysis job lifecycle event Webhook

Authorizations:
oauth2ClientCredentials
Request Body schema: application/json
required
event
required
string
Enum: "video-analysis.progress" "video-analysis.completed" "video-analysis.failed"

The lifecycle event that triggered this notification. video-analysis.progress: the job is processing; data.attributes.progress is updated. video-analysis.completed: the job finished successfully; data.attributes.result is populated. video-analysis.failed: the job encountered an unrecoverable error; data.attributes.error is populated.

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the video analysis job.

type
string
Value: "video-analysis-job"

Resource type identifier. Always video-analysis-job.

object (video-analysis_Attributes)

Full attributes of a video analysis job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

object (video-analysis_Result)

Video analysis output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

Responses

Request samples

Content type
application/json
{
  • "event": "video-analysis.completed",
  • "data": {
    • "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
    • "type": "video-analysis-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "auto",
        • "confidence_threshold": 0.7,
        • "priority": "standard"
        },
      • "result": {
        • "video_metadata": {
          • "duration": 3742.5,
          • "width": 1920,
          • "height": 1080,
          • "frame_rate": 29.97,
          • "format": "mp4",
          • "codec": "h264"
          },
        • "labels": [
          • {
            }
          ],
        • "scenes": [
          • {
            }
          ],
        • "faces": [
          • {
            }
          ],
        • "speech_to_text": [
          • {
            }
          ],
        • "ocr": [
          • {
            }
          ],
        • "content_moderation": {
          • "is_safe": true,
          • "signals": [
            ]
          },
        • "sentiment": {
          • "overall": "positive",
          • "score": 0.62,
          • "instances": [
            ]
          },
        • "topics": [
          • {
            }
          ],
        • "brands": [
          • {
            }
          ],
        • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
        }
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Create video analysis job

Authorizations:
oauth2ClientCredentials
Request Body schema: application/json
required
required
object

JSON:API-style data envelope.

type
required
string

Resource type identifier. Must be video-analysis-job.

required
object (video-analysis_AttributesCreate)

Input fields required to create a video analysis job.

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

provider
string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

property name*
additional property
any

Responses

Request samples

Content type
application/json
{}

Response samples

Content type
application/json
{
  • "data": {
    • "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
    • "type": "video-analysis-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "auto",
        • "confidence_threshold": 0.7,
        • "priority": "standard"
        },
      • "result": {
        • "video_metadata": {
          • "duration": 3742.5,
          • "width": 1920,
          • "height": 1080,
          • "frame_rate": 29.97,
          • "format": "mp4",
          • "codec": "h264"
          },
        • "labels": [
          • {
            }
          ],
        • "scenes": [
          • {
            }
          ],
        • "faces": [
          • {
            }
          ],
        • "speech_to_text": [
          • {
            }
          ],
        • "ocr": [
          • {
            }
          ],
        • "content_moderation": {
          • "is_safe": true,
          • "signals": [
            ]
          },
        • "sentiment": {
          • "overall": "positive",
          • "score": 0.62,
          • "instances": [
            ]
          },
        • "topics": [
          • {
            }
          ],
        • "brands": [
          • {
            }
          ],
        • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
        }
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Get video analysis job

Authorizations:
oauth2ClientCredentials
path Parameters
id
required
string <uuid>
Example: 3e7dc4b2-91f0-4a1e-8c2d-b56789012345

Responses

Response Schema: application/json
object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the video analysis job.

type
string
Value: "video-analysis-job"

Resource type identifier. Always video-analysis-job.

object (video-analysis_Attributes)

Full attributes of a video analysis job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

object (video-analysis_Result)

Video analysis output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

Response samples

Content type
application/json
{
  • "data": {
    • "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
    • "type": "video-analysis-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "auto",
        • "confidence_threshold": 0.7,
        • "priority": "standard"
        },
      • "result": {
        • "video_metadata": {
          • "duration": 3742.5,
          • "width": 1920,
          • "height": 1080,
          • "frame_rate": 29.97,
          • "format": "mp4",
          • "codec": "h264"
          },
        • "labels": [
          • {
            }
          ],
        • "scenes": [
          • {
            }
          ],
        • "faces": [
          • {
            }
          ],
        • "speech_to_text": [
          • {
            }
          ],
        • "ocr": [
          • {
            }
          ],
        • "content_moderation": {
          • "is_safe": true,
          • "signals": [
            ]
          },
        • "sentiment": {
          • "overall": "positive",
          • "score": 0.62,
          • "instances": [
            ]
          },
        • "topics": [
          • {
            }
          ],
        • "brands": [
          • {
            }
          ],
        • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
        }
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

Get video analysis result

Authorizations:
oauth2ClientCredentials
path Parameters
id
required
string <uuid>
Example: 3e7dc4b2-91f0-4a1e-8c2d-b56789012345

Responses

Response Schema: application/json
object (VideoMetadata)

Technical properties of the processed video file.

duration
number

Total duration of the video in seconds.

width
integer

Video width in pixels.

height
integer

Video height in pixels.

frame_rate
number

Frames per second of the video.

format
string

Container format of the video file.

codec
string

Video codec used for encoding.

Array of objects (LabelDetection)

Detected objects, scenes, and actions. Present when labels was requested.

Array
name
string

Human-readable name of the detected label.

confidence
number [ 0 .. 1 ]

Overall confidence score for this label across the video.

Array of objects (TimedInstance)

Time ranges in which this label was detected.

Array of objects (SceneDetection)

Scene and shot boundaries. Present when scenes was requested.

Array
index
integer

Zero-based position of this scene in the video.

start
number

Start time of the scene in seconds.

end
number

End time of the scene in seconds.

Array of objects (FaceDetection)

Faces detected and tracked across the video. Present when faces was requested.

Array
track_id
integer

Integer identifier grouping all appearances of the same face within this video.

fingerprint
string

Base64-encoded face embedding vector produced by the underlying model. When present, fingerprints from different videos can be compared for similarity to determine whether the same person appears across videos. Fingerprints are only comparable when produced by the same backend model — cross-model comparison is not meaningful. Not all backends populate this field.

Array of objects (TimedInstance)

Time ranges in which this face is visible.

Array of objects (TranscriptSegment)

Speech-to-text segments with speaker identification. Present when speech_to_text was requested.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed speech for this time range.

speaker_id
integer

Integer identifier grouping segments from the same speaker.

language
string

BCP 47 language code detected for this segment.

confidence
number [ 0 .. 1 ]

Confidence score for this transcript segment.

Array of objects (OcrText)

On-screen text extracted from video frames. Present when ocr was requested.

Array
text
string

The detected text string.

confidence
number [ 0 .. 1 ]

Confidence score for this text detection.

language
string

BCP 47 language code of the detected text.

Array of objects (TimedInstance)

Time ranges in which this text is visible on screen.

object (ContentModeration)

Content moderation signals. Present when content_moderation was requested.

is_safe
boolean

Whether the video passed moderation at the requested confidence threshold.

Array of objects (ModerationSignal)

Individual moderation signals detected above the confidence threshold.

Array
label
string
Enum: "explicit_nudity" "suggestive" "violence" "visually_disturbing" "hate_symbols" "tobacco" "alcohol" "gambling"

Machine-readable label identifying the type of flagged content.

confidence
number [ 0 .. 1 ]

Overall confidence score for this signal across the video.

Array of objects (TimedInstance)

Time ranges in which this signal was detected.

object (Sentiment)

Overall tone and sentiment of the video. Present when sentiment was requested.

overall
string
Enum: "positive" "neutral" "negative"

Dominant sentiment across the entire video.

score
number [ -1 .. 1 ]

Aggregate sentiment score from -1 (most negative) to 1 (most positive).

Array of objects (SentimentInstance)

Sentiment variations across the video timeline.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

label
string
Enum: "positive" "neutral" "negative"

Sentiment label for this time range.

score
number [ -1 .. 1 ]

Sentiment score for this time range.

Array of objects (Topic)

Key topics and keywords extracted from the video. Present when topics was requested.

Array
name
string

Topic or keyword name.

confidence
number [ 0 .. 1 ]

Confidence score for this topic.

Array of objects (TimedInstance)

Time ranges in which this topic is relevant.

Array of objects (BrandDetection)

Detected brand logos and visual trademarks. Present when brands was requested.

Array
name
string

Name of the detected brand.

confidence
number [ 0 .. 1 ]

Overall confidence score for this brand detection.

Array of objects (TimedInstance)

Time ranges in which this brand is visible on screen.

summary
string

Natural-language description of the video content. Present when summary was requested.

Response samples

Content type
application/json
{
  • "video_metadata": {
    • "duration": 3742.5,
    • "width": 1920,
    • "height": 1080,
    • "frame_rate": 29.97,
    • "format": "mp4",
    • "codec": "h264"
    },
  • "labels": [
    • {
      • "name": "Conference room",
      • "confidence": 0.94,
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "scenes": [
    • {
      • "index": 3,
      • "start": 42,
      • "end": 78.5
      }
    ],
  • "faces": [
    • {
      • "track_id": 1,
      • "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "speech_to_text": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "text": "Welcome to today's panel discussion on AI safety.",
      • "speaker_id": 0,
      • "language": "en",
      • "confidence": 0.97
      }
    ],
  • "ocr": [
    • {
      • "text": "Q3 Revenue: $4.2M",
      • "confidence": 0.91,
      • "language": "en",
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "content_moderation": {
    • "is_safe": true,
    • "signals": [
      • {
        • "label": "violence",
        • "confidence": 0.82,
        • "instances": [
          • {
            }
          ]
        }
      ]
    },
  • "sentiment": {
    • "overall": "positive",
    • "score": 0.62,
    • "instances": [
      • {
        • "start": 0,
        • "end": 45,
        • "label": "positive",
        • "score": 0.71
        }
      ]
    },
  • "topics": [
    • {
      • "name": "artificial intelligence",
      • "confidence": 0.95,
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "brands": [
    • {
      • "name": "Acme Corp",
      • "confidence": 0.88,
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}

Providers

Backend providers available in the gateway and their supported capabilities

List available providers

Returns all backend providers configured in the gateway, along with the resource types and features each one supports. Use this to determine which provider value to pass when creating a job, and which features are available for a given provider.

Authorizations:
oauth2ClientCredentials

Responses

Response Schema: application/json
Array of objects
Array
id
string

Unique identifier for the provider.

type
string
Value: "provider"
object (ProviderAttributes)

Capabilities of a provider across all supported resource types.

Response samples

Content type
application/json
{
  • "data": [
    • {
      • "id": "twelvelabs",
      • "type": "provider",
      • "attributes": {
        • "name": "TwelveLabs",
        • "resources": {
          • "transcriptions": {
            },
          • "translations": {
            },
          • "video-analyses": {
            }
          }
        }
      }
    ]
}

Schemas

Available schemas

Feature

string (Feature)
Enum: "language_detection" "diarization" "timestamps"

A specific transcription capability that may or may not be supported by a given provider. language_detection: automatically detect the source language without explicit specification. diarization: identify and label distinct speakers in the transcript. timestamps: produce time-aligned transcript segments.

"language_detection"

TranscriptionCapabilities

features
Array of strings (Feature)
Items Enum: "language_detection" "diarization" "timestamps"

Subset of transcription features this provider can handle.

{
  • "features": [
    • "language_detection",
    • "diarization",
    • "timestamps"
    ]
}

translation_Feature

string (translation_Feature)
Enum: "language_detection" "formality"

A specific translation capability that may or may not be supported by a given provider. language_detection: automatically detect the source language without explicit specification. formality: control the formality level (formal, informal) of the translated output.

"language_detection"

TranslationCapabilities

features
Array of strings (translation_Feature)
Items Enum: "language_detection" "formality"

Subset of translation features this provider can handle.

{
  • "features": [
    • "language_detection",
    • "formality"
    ]
}

video-analysis_Feature

string (video-analysis_Feature)
Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary"

A specific analysis capability to apply to the video. labels: detect objects, scenes, and actions throughout the video. scenes: detect scene and shot boundaries. faces: detect and track faces across frames. speech_to_text: convert speech to text, with speaker identification. ocr: extract on-screen text from video frames. content_moderation: flag explicit or inappropriate content. sentiment: analyse the overall tone and emotional valence. topics: extract key topics and keywords from audio and visual content. brands: detect brand logos and visual trademarks. summary: generate a natural-language description of the video content.

"labels"

VideoAnalysisCapabilities

features
Array of strings (video-analysis_Feature)
Items Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary"

Subset of video analysis features this provider can handle.

{
  • "features": [
    • "labels",
    • "scenes",
    • "faces",
    • "speech_to_text",
    • "ocr",
    • "content_moderation",
    • "sentiment",
    • "topics",
    • "brands",
    • "summary"
    ]
}

ProviderAttributes

name
string

Human-readable name of the provider.

object

Capabilities offered by this provider per resource type. Only resource types supported by this provider appear in this object.

object (TranscriptionCapabilities)

Transcription features supported by this provider.

features
Array of strings (Feature)
Items Enum: "language_detection" "diarization" "timestamps"

Subset of transcription features this provider can handle.

object (TranslationCapabilities)

Translation features supported by this provider.

features
Array of strings (translation_Feature)
Items Enum: "language_detection" "formality"

Subset of translation features this provider can handle.

object (VideoAnalysisCapabilities)

Video analysis features supported by this provider.

features
Array of strings (video-analysis_Feature)
Items Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary"

Subset of video analysis features this provider can handle.

{
  • "name": "TwelveLabs",
  • "resources": {
    • "transcriptions": {
      • "features": [
        • "language_detection",
        • "diarization",
        • "timestamps"
        ]
      },
    • "translations": {
      • "features": [
        • "language_detection",
        • "formality"
        ]
      },
    • "video-analyses": {
      • "features": [
        • "labels",
        • "scenes",
        • "faces",
        • "speech_to_text",
        • "ocr",
        • "content_moderation",
        • "sentiment",
        • "topics",
        • "brands",
        • "summary"
        ]
      }
    }
}

ProviderList

Array of objects
Array
id
string

Unique identifier for the provider.

type
string
Value: "provider"
object (ProviderAttributes)

Capabilities of a provider across all supported resource types.

{
  • "data": [
    • {
      • "id": "twelvelabs",
      • "type": "provider",
      • "attributes": {
        • "name": "TwelveLabs",
        • "resources": {
          • "transcriptions": {
            },
          • "translations": {
            },
          • "video-analyses": {
            }
          }
        }
      }
    ]
}

Input

type
required
string
Enum: "video" "audio"

Media kind. video — audio is extracted from the container, then transcribed (two-step pipeline). audio — the file is transcribed directly.

url
required
string <uri>

HTTP or HTTPS URL of the source file (e.g. MP4, WAV, MP3). Must be publicly accessible or reachable on the deployment network (e.g. MinIO in local Docker Compose).

audio_track
integer

Zero-based index of the audio track to transcribe when type is video. Omit to use the first audio track. Ignored for type audio.

{}

Options

language
string

BCP 47 language code of the speech in the source media (e.g. en, en-US, fr, de). Two-letter codes are normalized where applicable (enen-US). Use auto to request automatic language detection when the selected provider supports it. The default Eden AI Google engine does not auto-detect; specify a language explicitly or rely on the deployment default (typically en-US) when this field is omitted.

timestamps
boolean
Default: true

When true, the result includes a segments array with start, end, and text for each utterance. When false or omitted, the backend may return plain text only (provider-dependent).

format
string
Enum: "srt" "vtt" "json"

How the transcript is returned in attributes.result and GET /v1/transcriptions/{id}/result. json — structured object with segments (recommended for APIs). srt / vtt — SubRip or WebVTT subtitle text; a download_url may be provided when the provider exports a file.

diarization
boolean
Default: false

When true, requests speaker diarization (who spoke when). Support depends on the provider; see diarization under transcription features in GET /v1/providers.

priority
string
Default: "standard"
Enum: "low" "standard" "high"

Relative queue priority for the job. high jobs are scheduled before standard and low when the platform is under load. Does not change transcription quality.

{
  • "language": "en",
  • "timestamps": true,
  • "format": "json",
  • "diarization": false,
  • "priority": "standard"
}

Webhook

url
required
string <uri>

The endpoint the gateway will POST event payloads to.

object

Optional HTTP headers included in every webhook request. Typically used for authentication.

property name*
additional property
string
{}

Provider

string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

"twelvelabs"

AttributesCreate

required
object (Input)

Source audio or video file to transcribe.

type
required
string
Enum: "video" "audio"

Media kind. video — audio is extracted from the container, then transcribed (two-step pipeline). audio — the file is transcribed directly.

url
required
string <uri>

HTTP or HTTPS URL of the source file (e.g. MP4, WAV, MP3). Must be publicly accessible or reachable on the deployment network (e.g. MinIO in local Docker Compose).

audio_track
integer

Zero-based index of the audio track to transcribe when type is video. Omit to use the first audio track. Ignored for type audio.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

language
string

BCP 47 language code of the speech in the source media (e.g. en, en-US, fr, de). Two-letter codes are normalized where applicable (enen-US). Use auto to request automatic language detection when the selected provider supports it. The default Eden AI Google engine does not auto-detect; specify a language explicitly or rely on the deployment default (typically en-US) when this field is omitted.

timestamps
boolean
Default: true

When true, the result includes a segments array with start, end, and text for each utterance. When false or omitted, the backend may return plain text only (provider-dependent).

format
string
Enum: "srt" "vtt" "json"

How the transcript is returned in attributes.result and GET /v1/transcriptions/{id}/result. json — structured object with segments (recommended for APIs). srt / vtt — SubRip or WebVTT subtitle text; a download_url may be provided when the provider exports a file.

diarization
boolean
Default: false

When true, requests speaker diarization (who spoke when). Support depends on the provider; see diarization under transcription features in GET /v1/providers.

priority
string
Default: "standard"
Enum: "low" "standard" "high"

Relative queue priority for the job. high jobs are scheduled before standard and low when the platform is under load. Does not change transcription quality.

object (Webhook)

Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition (transcription.progress, transcription.completed, transcription.failed).

url
required
string <uri>

The endpoint the gateway will POST event payloads to.

object

Optional HTTP headers included in every webhook request. Typically used for authentication.

property name*
additional property
string
provider
string (Provider)

Backend provider id for this job (e.g. eden-ai). When omitted, the gateway picks a provider that supports the requested options. List providers and features with GET /v1/providers.

{}

Meta

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

property name*
additional property
any
object

Internal metadata added by the gateway. Not exposed unless explicitly required.

property name*
additional property
any
{
  • "client": { },
  • "system": {
    • "region": "eu-west-1",
    • "worker_id": "wk_789"
    }
}

CreateRequest

required
object

JSON:API-style resource envelope. Must include type and attributes; meta is optional.

type
required
string

Resource type discriminator. Must be exactly transcription-job for this endpoint.

required
object (AttributesCreate)

Parameters that control what is transcribed and how. Only input is required; configure options, webhook, and provider as needed.

required
object (Input)

Source audio or video file to transcribe.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

object (Webhook)

Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition (transcription.progress, transcription.completed, transcription.failed).

provider
string (Provider)

Backend provider id for this job (e.g. eden-ai). When omitted, the gateway picks a provider that supports the requested options. List providers and features with GET /v1/providers.

object (Meta)

Optional client metadata (meta.client). Stored with the job and returned unchanged in GET responses and webhook payloads. Not used for routing or processing.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{}

JobStatus

string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

"processing"

Error

code
string

Machine-readable error identifier.

message
string

Human-readable explanation of the error.

{
  • "code": "AUDIO_UNREADABLE",
  • "message": "Could not extract audio from the provided file."
}

Attributes

provider
string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

code
string

Machine-readable error identifier.

message
string

Human-readable explanation of the error.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

{
  • "provider": "twelvelabs",
  • "status": "processing",
  • "progress": 72,
  • "error": {
    • "code": "AUDIO_UNREADABLE",
    • "message": "Could not extract audio from the provided file."
    },
  • "created_at": "2024-03-15T10:00:00Z",
  • "processed_at": "2024-03-15T10:00:05Z",
  • "completed_at": "2024-03-15T10:02:30Z"
}

Segment

start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed text for this time range.

{
  • "start": 12.5,
  • "end": 15.8,
  • "text": "Welcome to today's interview."
}

Result

format
string
Enum: "srt" "vtt" "json"

Format of the transcription result, matching the requested output format.

duration
number

Total duration of the media file in seconds.

language
string

BCP 47 language code of the transcribed audio, as detected or specified.

download_url
string <uri>

Pre-signed URL to download the full transcription file. Valid for a limited time.

Array of objects (Segment)

Time-aligned transcript segments. Present when timestamps was enabled.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed text for this time range.

{}

transcription_Attributes

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

code
string

Machine-readable error identifier.

message
string

Human-readable explanation of the error.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (Input)

Source audio or video file to transcribe.

type
required
string
Enum: "video" "audio"

Media kind. video — audio is extracted from the container, then transcribed (two-step pipeline). audio — the file is transcribed directly.

url
required
string <uri>

HTTP or HTTPS URL of the source file (e.g. MP4, WAV, MP3). Must be publicly accessible or reachable on the deployment network (e.g. MinIO in local Docker Compose).

audio_track
integer

Zero-based index of the audio track to transcribe when type is video. Omit to use the first audio track. Ignored for type audio.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

language
string

BCP 47 language code of the speech in the source media (e.g. en, en-US, fr, de). Two-letter codes are normalized where applicable (enen-US). Use auto to request automatic language detection when the selected provider supports it. The default Eden AI Google engine does not auto-detect; specify a language explicitly or rely on the deployment default (typically en-US) when this field is omitted.

timestamps
boolean
Default: true

When true, the result includes a segments array with start, end, and text for each utterance. When false or omitted, the backend may return plain text only (provider-dependent).

format
string
Enum: "srt" "vtt" "json"

How the transcript is returned in attributes.result and GET /v1/transcriptions/{id}/result. json — structured object with segments (recommended for APIs). srt / vtt — SubRip or WebVTT subtitle text; a download_url may be provided when the provider exports a file.

diarization
boolean
Default: false

When true, requests speaker diarization (who spoke when). Support depends on the provider; see diarization under transcription features in GET /v1/providers.

priority
string
Default: "standard"
Enum: "low" "standard" "high"

Relative queue priority for the job. high jobs are scheduled before standard and low when the platform is under load. Does not change transcription quality.

object (Webhook)

Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition (transcription.progress, transcription.completed, transcription.failed).

url
required
string <uri>

The endpoint the gateway will POST event payloads to.

object

Optional HTTP headers included in every webhook request. Typically used for authentication.

property name*
additional property
string
object (Result)

Transcription output. Populated once status is completed.

format
string
Enum: "srt" "vtt" "json"

Format of the transcription result, matching the requested output format.

duration
number

Total duration of the media file in seconds.

language
string

BCP 47 language code of the transcribed audio, as detected or specified.

download_url
string <uri>

Pre-signed URL to download the full transcription file. Valid for a limited time.

Array of objects (Segment)

Time-aligned transcript segments. Present when timestamps was enabled.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed text for this time range.

{
  • "provider": "twelvelabs",
  • "status": "processing",
  • "progress": 72,
  • "error": {
    • "code": "AUDIO_UNREADABLE",
    • "message": "Could not extract audio from the provided file."
    },
  • "created_at": "2024-03-15T10:00:00Z",
  • "processed_at": "2024-03-15T10:00:05Z",
  • "completed_at": "2024-03-15T10:02:30Z",
  • "input": {},
  • "options": {
    • "language": "en",
    • "timestamps": true,
    • "format": "json",
    • "diarization": false,
    • "priority": "standard"
    },
  • "result": {}
}

Job

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the transcription job.

type
string
Value: "transcription-job"

Resource type identifier. Always transcription-job.

object (transcription_Attributes)

Full attributes of a transcription job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (Input)

Source audio or video file to transcribe.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

object (Webhook)

Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition (transcription.progress, transcription.completed, transcription.failed).

object (Result)

Transcription output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "data": {
    • "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
    • "type": "transcription-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "en",
        • "timestamps": true,
        • "format": "json",
        • "diarization": false,
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

ErrorResponse

object (Error)

Details of a job failure. Only present when status is failed.

code
string

Machine-readable error identifier.

message
string

Human-readable explanation of the error.

{
  • "error": {
    • "code": "AUDIO_UNREADABLE",
    • "message": "Could not extract audio from the provided file."
    }
}

translation_Input

type
required
string
Enum: "text" "document"

Type of the source content.

content
string

The text content to translate. Required when type is text.

url
string <uri>

Publicly accessible URL of the document to translate. Required when type is document.

target_language
required
string

BCP 47 language code of the target language.

{}

translation_Options

source_language
string

BCP 47 language code of the source content. Use auto to detect the language automatically.

formality
string
Enum: "default" "formal" "informal"

Formality level of the translated output.

priority
string
Enum: "low" "standard" "high"

Processing priority. Higher priority jobs are picked up sooner.

{
  • "source_language": "auto",
  • "formality": "formal",
  • "priority": "standard"
}

translation_AttributesCreate

required
object (translation_Input)

Source content to translate.

type
required
string
Enum: "text" "document"

Type of the source content.

content
string

The text content to translate. Required when type is text.

url
string <uri>

Publicly accessible URL of the document to translate. Required when type is document.

target_language
required
string

BCP 47 language code of the target language.

object (translation_Options)

Optional settings controlling translation behaviour.

source_language
string

BCP 47 language code of the source content. Use auto to detect the language automatically.

formality
string
Enum: "default" "formal" "informal"

Formality level of the translated output.

priority
string
Enum: "low" "standard" "high"

Processing priority. Higher priority jobs are picked up sooner.

object (Webhook)

Destination configuration for job lifecycle notifications.

url
required
string <uri>

The endpoint the gateway will POST event payloads to.

object

Optional HTTP headers included in every webhook request. Typically used for authentication.

property name*
additional property
string
provider
string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

{
  • "input": {},
  • "options": {
    • "source_language": "auto",
    • "formality": "formal",
    • "priority": "standard"
    },
  • "provider": "twelvelabs"
}

translation_CreateRequest

required
object

JSON:API-style data envelope.

type
required
string

Resource type identifier. Must be translation-job.

required
object (translation_AttributesCreate)

Input fields required to create a translation job.

required
object (translation_Input)

Source content to translate.

object (translation_Options)

Optional settings controlling translation behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

provider
string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "data": {
    • "type": "translation-job",
    • "attributes": {
      • "input": {},
      • "options": {
        • "source_language": "auto",
        • "formality": "formal",
        • "priority": "standard"
        },
      • "provider": "twelvelabs"
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

translation_Result

source_language
string

BCP 47 language code of the source content, as detected or specified.

target_language
string

BCP 47 language code of the translated output.

content
string

Translated text content. Present when input type is text.

download_url
string <uri>

Pre-signed URL to download the translated file. Valid for a limited time. Present when input type is document.

character_count
integer

Number of characters in the source content.

{}

translation_Attributes

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

code
string

Machine-readable error identifier.

message
string

Human-readable explanation of the error.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (translation_Input)

Source content to translate.

type
required
string
Enum: "text" "document"

Type of the source content.

content
string

The text content to translate. Required when type is text.

url
string <uri>

Publicly accessible URL of the document to translate. Required when type is document.

target_language
required
string

BCP 47 language code of the target language.

object (translation_Options)

Optional settings controlling translation behaviour.

source_language
string

BCP 47 language code of the source content. Use auto to detect the language automatically.

formality
string
Enum: "default" "formal" "informal"

Formality level of the translated output.

priority
string
Enum: "low" "standard" "high"

Processing priority. Higher priority jobs are picked up sooner.

object (Webhook)

Destination configuration for job lifecycle notifications.

url
required
string <uri>

The endpoint the gateway will POST event payloads to.

object

Optional HTTP headers included in every webhook request. Typically used for authentication.

property name*
additional property
string
object (translation_Result)

Translation output. Populated once status is completed.

source_language
string

BCP 47 language code of the source content, as detected or specified.

target_language
string

BCP 47 language code of the translated output.

content
string

Translated text content. Present when input type is text.

download_url
string <uri>

Pre-signed URL to download the translated file. Valid for a limited time. Present when input type is document.

character_count
integer

Number of characters in the source content.

{
  • "provider": "twelvelabs",
  • "status": "processing",
  • "progress": 72,
  • "error": {
    • "code": "AUDIO_UNREADABLE",
    • "message": "Could not extract audio from the provided file."
    },
  • "created_at": "2024-03-15T10:00:00Z",
  • "processed_at": "2024-03-15T10:00:05Z",
  • "completed_at": "2024-03-15T10:02:30Z",
  • "input": {},
  • "options": {
    • "source_language": "auto",
    • "formality": "formal",
    • "priority": "standard"
    },
  • "result": {}
}

translation_Job

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the translation job.

type
string
Value: "translation-job"

Resource type identifier. Always translation-job.

object (translation_Attributes)

Full attributes of a translation job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (translation_Input)

Source content to translate.

object (translation_Options)

Optional settings controlling translation behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

object (translation_Result)

Translation output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "data": {
    • "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
    • "type": "translation-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "source_language": "auto",
        • "formality": "formal",
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

video-analysis_Input

url
required
string <uri>

Publicly accessible URL of the video file.

features
required
Array of strings (video-analysis_Feature) non-empty
Items Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary"

One or more analysis capabilities to apply. At least one feature must be specified.

audio_track
integer

Index of the audio track to use for speech-related features. Defaults to the first track when omitted.

{}

video-analysis_Options

language
string

BCP 47 language code for speech and text features. Use auto to detect the language automatically.

confidence_threshold
number [ 0 .. 1 ]

Minimum confidence score (0–1) for a detection to be included in the result. Defaults to 0.5.

priority
string
Enum: "low" "standard" "high"

Processing priority. Higher priority jobs are picked up sooner.

{
  • "language": "auto",
  • "confidence_threshold": 0.7,
  • "priority": "standard"
}

video-analysis_AttributesCreate

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

url
required
string <uri>

Publicly accessible URL of the video file.

features
required
Array of strings (video-analysis_Feature) non-empty
Items Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary"

One or more analysis capabilities to apply. At least one feature must be specified.

audio_track
integer

Index of the audio track to use for speech-related features. Defaults to the first track when omitted.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

language
string

BCP 47 language code for speech and text features. Use auto to detect the language automatically.

confidence_threshold
number [ 0 .. 1 ]

Minimum confidence score (0–1) for a detection to be included in the result. Defaults to 0.5.

priority
string
Enum: "low" "standard" "high"

Processing priority. Higher priority jobs are picked up sooner.

object (Webhook)

Destination configuration for job lifecycle notifications.

url
required
string <uri>

The endpoint the gateway will POST event payloads to.

object

Optional HTTP headers included in every webhook request. Typically used for authentication.

property name*
additional property
string
provider
string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

{
  • "input": {},
  • "options": {
    • "language": "auto",
    • "confidence_threshold": 0.7,
    • "priority": "standard"
    },
  • "provider": "twelvelabs"
}

video-analysis_CreateRequest

required
object

JSON:API-style data envelope.

type
required
string

Resource type identifier. Must be video-analysis-job.

required
object (video-analysis_AttributesCreate)

Input fields required to create a video analysis job.

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

provider
string (Provider)

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "data": {
    • "type": "video-analysis-job",
    • "attributes": {
      • "input": {},
      • "options": {
        • "language": "auto",
        • "confidence_threshold": 0.7,
        • "priority": "standard"
        },
      • "provider": "twelvelabs"
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

VideoMetadata

duration
number

Total duration of the video in seconds.

width
integer

Video width in pixels.

height
integer

Video height in pixels.

frame_rate
number

Frames per second of the video.

format
string

Container format of the video file.

codec
string

Video codec used for encoding.

{
  • "duration": 3742.5,
  • "width": 1920,
  • "height": 1080,
  • "frame_rate": 29.97,
  • "format": "mp4",
  • "codec": "h264"
}

TimedInstance

start
number

Start time of the occurrence in seconds.

end
number

End time of the occurrence in seconds.

confidence
number [ 0 .. 1 ]

Confidence score for this specific occurrence.

{
  • "start": 12.5,
  • "end": 15.8,
  • "confidence": 0.91
}

LabelDetection

name
string

Human-readable name of the detected label.

confidence
number [ 0 .. 1 ]

Overall confidence score for this label across the video.

Array of objects (TimedInstance)

Time ranges in which this label was detected.

Array
start
number

Start time of the occurrence in seconds.

end
number

End time of the occurrence in seconds.

confidence
number [ 0 .. 1 ]

Confidence score for this specific occurrence.

{
  • "name": "Conference room",
  • "confidence": 0.94,
  • "instances": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "confidence": 0.91
      }
    ]
}

SceneDetection

index
integer

Zero-based position of this scene in the video.

start
number

Start time of the scene in seconds.

end
number

End time of the scene in seconds.

{
  • "index": 3,
  • "start": 42,
  • "end": 78.5
}

FaceDetection

track_id
integer

Integer identifier grouping all appearances of the same face within this video.

fingerprint
string

Base64-encoded face embedding vector produced by the underlying model. When present, fingerprints from different videos can be compared for similarity to determine whether the same person appears across videos. Fingerprints are only comparable when produced by the same backend model — cross-model comparison is not meaningful. Not all backends populate this field.

Array of objects (TimedInstance)

Time ranges in which this face is visible.

Array
start
number

Start time of the occurrence in seconds.

end
number

End time of the occurrence in seconds.

confidence
number [ 0 .. 1 ]

Confidence score for this specific occurrence.

{
  • "track_id": 1,
  • "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
  • "instances": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "confidence": 0.91
      }
    ]
}

TranscriptSegment

start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed speech for this time range.

speaker_id
integer

Integer identifier grouping segments from the same speaker.

language
string

BCP 47 language code detected for this segment.

confidence
number [ 0 .. 1 ]

Confidence score for this transcript segment.

{
  • "start": 12.5,
  • "end": 15.8,
  • "text": "Welcome to today's panel discussion on AI safety.",
  • "speaker_id": 0,
  • "language": "en",
  • "confidence": 0.97
}

OcrText

text
string

The detected text string.

confidence
number [ 0 .. 1 ]

Confidence score for this text detection.

language
string

BCP 47 language code of the detected text.

Array of objects (TimedInstance)

Time ranges in which this text is visible on screen.

Array
start
number

Start time of the occurrence in seconds.

end
number

End time of the occurrence in seconds.

confidence
number [ 0 .. 1 ]

Confidence score for this specific occurrence.

{
  • "text": "Q3 Revenue: $4.2M",
  • "confidence": 0.91,
  • "language": "en",
  • "instances": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "confidence": 0.91
      }
    ]
}

ModerationSignal

label
string
Enum: "explicit_nudity" "suggestive" "violence" "visually_disturbing" "hate_symbols" "tobacco" "alcohol" "gambling"

Machine-readable label identifying the type of flagged content.

confidence
number [ 0 .. 1 ]

Overall confidence score for this signal across the video.

Array of objects (TimedInstance)

Time ranges in which this signal was detected.

Array
start
number

Start time of the occurrence in seconds.

end
number

End time of the occurrence in seconds.

confidence
number [ 0 .. 1 ]

Confidence score for this specific occurrence.

{
  • "label": "violence",
  • "confidence": 0.82,
  • "instances": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "confidence": 0.91
      }
    ]
}

ContentModeration

is_safe
boolean

Whether the video passed moderation at the requested confidence threshold.

Array of objects (ModerationSignal)

Individual moderation signals detected above the confidence threshold.

Array
label
string
Enum: "explicit_nudity" "suggestive" "violence" "visually_disturbing" "hate_symbols" "tobacco" "alcohol" "gambling"

Machine-readable label identifying the type of flagged content.

confidence
number [ 0 .. 1 ]

Overall confidence score for this signal across the video.

Array of objects (TimedInstance)

Time ranges in which this signal was detected.

{
  • "is_safe": true,
  • "signals": [
    • {
      • "label": "violence",
      • "confidence": 0.82,
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ]
}

SentimentInstance

start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

label
string
Enum: "positive" "neutral" "negative"

Sentiment label for this time range.

score
number [ -1 .. 1 ]

Sentiment score for this time range.

{
  • "start": 0,
  • "end": 45,
  • "label": "positive",
  • "score": 0.71
}

Sentiment

overall
string
Enum: "positive" "neutral" "negative"

Dominant sentiment across the entire video.

score
number [ -1 .. 1 ]

Aggregate sentiment score from -1 (most negative) to 1 (most positive).

Array of objects (SentimentInstance)

Sentiment variations across the video timeline.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

label
string
Enum: "positive" "neutral" "negative"

Sentiment label for this time range.

score
number [ -1 .. 1 ]

Sentiment score for this time range.

{
  • "overall": "positive",
  • "score": 0.62,
  • "instances": [
    • {
      • "start": 0,
      • "end": 45,
      • "label": "positive",
      • "score": 0.71
      }
    ]
}

Topic

name
string

Topic or keyword name.

confidence
number [ 0 .. 1 ]

Confidence score for this topic.

Array of objects (TimedInstance)

Time ranges in which this topic is relevant.

Array
start
number

Start time of the occurrence in seconds.

end
number

End time of the occurrence in seconds.

confidence
number [ 0 .. 1 ]

Confidence score for this specific occurrence.

{
  • "name": "artificial intelligence",
  • "confidence": 0.95,
  • "instances": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "confidence": 0.91
      }
    ]
}

BrandDetection

name
string

Name of the detected brand.

confidence
number [ 0 .. 1 ]

Overall confidence score for this brand detection.

Array of objects (TimedInstance)

Time ranges in which this brand is visible on screen.

Array
start
number

Start time of the occurrence in seconds.

end
number

End time of the occurrence in seconds.

confidence
number [ 0 .. 1 ]

Confidence score for this specific occurrence.

{
  • "name": "Acme Corp",
  • "confidence": 0.88,
  • "instances": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "confidence": 0.91
      }
    ]
}

video-analysis_Result

object (VideoMetadata)

Technical properties of the processed video file.

duration
number

Total duration of the video in seconds.

width
integer

Video width in pixels.

height
integer

Video height in pixels.

frame_rate
number

Frames per second of the video.

format
string

Container format of the video file.

codec
string

Video codec used for encoding.

Array of objects (LabelDetection)

Detected objects, scenes, and actions. Present when labels was requested.

Array
name
string

Human-readable name of the detected label.

confidence
number [ 0 .. 1 ]

Overall confidence score for this label across the video.

Array of objects (TimedInstance)

Time ranges in which this label was detected.

Array of objects (SceneDetection)

Scene and shot boundaries. Present when scenes was requested.

Array
index
integer

Zero-based position of this scene in the video.

start
number

Start time of the scene in seconds.

end
number

End time of the scene in seconds.

Array of objects (FaceDetection)

Faces detected and tracked across the video. Present when faces was requested.

Array
track_id
integer

Integer identifier grouping all appearances of the same face within this video.

fingerprint
string

Base64-encoded face embedding vector produced by the underlying model. When present, fingerprints from different videos can be compared for similarity to determine whether the same person appears across videos. Fingerprints are only comparable when produced by the same backend model — cross-model comparison is not meaningful. Not all backends populate this field.

Array of objects (TimedInstance)

Time ranges in which this face is visible.

Array of objects (TranscriptSegment)

Speech-to-text segments with speaker identification. Present when speech_to_text was requested.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed speech for this time range.

speaker_id
integer

Integer identifier grouping segments from the same speaker.

language
string

BCP 47 language code detected for this segment.

confidence
number [ 0 .. 1 ]

Confidence score for this transcript segment.

Array of objects (OcrText)

On-screen text extracted from video frames. Present when ocr was requested.

Array
text
string

The detected text string.

confidence
number [ 0 .. 1 ]

Confidence score for this text detection.

language
string

BCP 47 language code of the detected text.

Array of objects (TimedInstance)

Time ranges in which this text is visible on screen.

object (ContentModeration)

Content moderation signals. Present when content_moderation was requested.

is_safe
boolean

Whether the video passed moderation at the requested confidence threshold.

Array of objects (ModerationSignal)

Individual moderation signals detected above the confidence threshold.

Array
label
string
Enum: "explicit_nudity" "suggestive" "violence" "visually_disturbing" "hate_symbols" "tobacco" "alcohol" "gambling"

Machine-readable label identifying the type of flagged content.

confidence
number [ 0 .. 1 ]

Overall confidence score for this signal across the video.

Array of objects (TimedInstance)

Time ranges in which this signal was detected.

object (Sentiment)

Overall tone and sentiment of the video. Present when sentiment was requested.

overall
string
Enum: "positive" "neutral" "negative"

Dominant sentiment across the entire video.

score
number [ -1 .. 1 ]

Aggregate sentiment score from -1 (most negative) to 1 (most positive).

Array of objects (SentimentInstance)

Sentiment variations across the video timeline.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

label
string
Enum: "positive" "neutral" "negative"

Sentiment label for this time range.

score
number [ -1 .. 1 ]

Sentiment score for this time range.

Array of objects (Topic)

Key topics and keywords extracted from the video. Present when topics was requested.

Array
name
string

Topic or keyword name.

confidence
number [ 0 .. 1 ]

Confidence score for this topic.

Array of objects (TimedInstance)

Time ranges in which this topic is relevant.

Array of objects (BrandDetection)

Detected brand logos and visual trademarks. Present when brands was requested.

Array
name
string

Name of the detected brand.

confidence
number [ 0 .. 1 ]

Overall confidence score for this brand detection.

Array of objects (TimedInstance)

Time ranges in which this brand is visible on screen.

summary
string

Natural-language description of the video content. Present when summary was requested.

{
  • "video_metadata": {
    • "duration": 3742.5,
    • "width": 1920,
    • "height": 1080,
    • "frame_rate": 29.97,
    • "format": "mp4",
    • "codec": "h264"
    },
  • "labels": [
    • {
      • "name": "Conference room",
      • "confidence": 0.94,
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "scenes": [
    • {
      • "index": 3,
      • "start": 42,
      • "end": 78.5
      }
    ],
  • "faces": [
    • {
      • "track_id": 1,
      • "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "speech_to_text": [
    • {
      • "start": 12.5,
      • "end": 15.8,
      • "text": "Welcome to today's panel discussion on AI safety.",
      • "speaker_id": 0,
      • "language": "en",
      • "confidence": 0.97
      }
    ],
  • "ocr": [
    • {
      • "text": "Q3 Revenue: $4.2M",
      • "confidence": 0.91,
      • "language": "en",
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "content_moderation": {
    • "is_safe": true,
    • "signals": [
      • {
        • "label": "violence",
        • "confidence": 0.82,
        • "instances": [
          • {
            }
          ]
        }
      ]
    },
  • "sentiment": {
    • "overall": "positive",
    • "score": 0.62,
    • "instances": [
      • {
        • "start": 0,
        • "end": 45,
        • "label": "positive",
        • "score": 0.71
        }
      ]
    },
  • "topics": [
    • {
      • "name": "artificial intelligence",
      • "confidence": 0.95,
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "brands": [
    • {
      • "name": "Acme Corp",
      • "confidence": 0.88,
      • "instances": [
        • {
          • "start": 12.5,
          • "end": 15.8,
          • "confidence": 0.91
          }
        ]
      }
    ],
  • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
}

video-analysis_Attributes

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

code
string

Machine-readable error identifier.

message
string

Human-readable explanation of the error.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

url
required
string <uri>

Publicly accessible URL of the video file.

features
required
Array of strings (video-analysis_Feature) non-empty
Items Enum: "labels" "scenes" "faces" "speech_to_text" "ocr" "content_moderation" "sentiment" "topics" "brands" "summary"

One or more analysis capabilities to apply. At least one feature must be specified.

audio_track
integer

Index of the audio track to use for speech-related features. Defaults to the first track when omitted.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

language
string

BCP 47 language code for speech and text features. Use auto to detect the language automatically.

confidence_threshold
number [ 0 .. 1 ]

Minimum confidence score (0–1) for a detection to be included in the result. Defaults to 0.5.

priority
string
Enum: "low" "standard" "high"

Processing priority. Higher priority jobs are picked up sooner.

object (Webhook)

Destination configuration for job lifecycle notifications.

url
required
string <uri>

The endpoint the gateway will POST event payloads to.

object

Optional HTTP headers included in every webhook request. Typically used for authentication.

property name*
additional property
string
object (video-analysis_Result)

Video analysis output. Populated once status is completed.

object (VideoMetadata)

Technical properties of the processed video file.

duration
number

Total duration of the video in seconds.

width
integer

Video width in pixels.

height
integer

Video height in pixels.

frame_rate
number

Frames per second of the video.

format
string

Container format of the video file.

codec
string

Video codec used for encoding.

Array of objects (LabelDetection)

Detected objects, scenes, and actions. Present when labels was requested.

Array
name
string

Human-readable name of the detected label.

confidence
number [ 0 .. 1 ]

Overall confidence score for this label across the video.

Array of objects (TimedInstance)

Time ranges in which this label was detected.

Array of objects (SceneDetection)

Scene and shot boundaries. Present when scenes was requested.

Array
index
integer

Zero-based position of this scene in the video.

start
number

Start time of the scene in seconds.

end
number

End time of the scene in seconds.

Array of objects (FaceDetection)

Faces detected and tracked across the video. Present when faces was requested.

Array
track_id
integer

Integer identifier grouping all appearances of the same face within this video.

fingerprint
string

Base64-encoded face embedding vector produced by the underlying model. When present, fingerprints from different videos can be compared for similarity to determine whether the same person appears across videos. Fingerprints are only comparable when produced by the same backend model — cross-model comparison is not meaningful. Not all backends populate this field.

Array of objects (TimedInstance)

Time ranges in which this face is visible.

Array of objects (TranscriptSegment)

Speech-to-text segments with speaker identification. Present when speech_to_text was requested.

Array
start
number

Start time of the segment in seconds.

end
number

End time of the segment in seconds.

text
string

Transcribed speech for this time range.

speaker_id
integer

Integer identifier grouping segments from the same speaker.

language
string

BCP 47 language code detected for this segment.

confidence
number [ 0 .. 1 ]

Confidence score for this transcript segment.

Array of objects (OcrText)

On-screen text extracted from video frames. Present when ocr was requested.

Array
text
string

The detected text string.

confidence
number [ 0 .. 1 ]

Confidence score for this text detection.

language
string

BCP 47 language code of the detected text.

Array of objects (TimedInstance)

Time ranges in which this text is visible on screen.

object (ContentModeration)

Content moderation signals. Present when content_moderation was requested.

is_safe
boolean

Whether the video passed moderation at the requested confidence threshold.

Array of objects (ModerationSignal)

Individual moderation signals detected above the confidence threshold.

object (Sentiment)

Overall tone and sentiment of the video. Present when sentiment was requested.

overall
string
Enum: "positive" "neutral" "negative"

Dominant sentiment across the entire video.

score
number [ -1 .. 1 ]

Aggregate sentiment score from -1 (most negative) to 1 (most positive).

Array of objects (SentimentInstance)

Sentiment variations across the video timeline.

Array of objects (Topic)

Key topics and keywords extracted from the video. Present when topics was requested.

Array
name
string

Topic or keyword name.

confidence
number [ 0 .. 1 ]

Confidence score for this topic.

Array of objects (TimedInstance)

Time ranges in which this topic is relevant.

Array of objects (BrandDetection)

Detected brand logos and visual trademarks. Present when brands was requested.

Array
name
string

Name of the detected brand.

confidence
number [ 0 .. 1 ]

Overall confidence score for this brand detection.

Array of objects (TimedInstance)

Time ranges in which this brand is visible on screen.

summary
string

Natural-language description of the video content. Present when summary was requested.

{
  • "provider": "twelvelabs",
  • "status": "processing",
  • "progress": 72,
  • "error": {
    • "code": "AUDIO_UNREADABLE",
    • "message": "Could not extract audio from the provided file."
    },
  • "created_at": "2024-03-15T10:00:00Z",
  • "processed_at": "2024-03-15T10:00:05Z",
  • "completed_at": "2024-03-15T10:02:30Z",
  • "input": {},
  • "options": {
    • "language": "auto",
    • "confidence_threshold": 0.7,
    • "priority": "standard"
    },
  • "result": {
    • "video_metadata": {
      • "duration": 3742.5,
      • "width": 1920,
      • "height": 1080,
      • "frame_rate": 29.97,
      • "format": "mp4",
      • "codec": "h264"
      },
    • "labels": [
      • {
        • "name": "Conference room",
        • "confidence": 0.94,
        • "instances": [
          • {
            }
          ]
        }
      ],
    • "scenes": [
      • {
        • "index": 3,
        • "start": 42,
        • "end": 78.5
        }
      ],
    • "faces": [
      • {
        • "track_id": 1,
        • "fingerprint": "7hGkL2mXqP9nRsT4vWzA...",
        • "instances": [
          • {
            }
          ]
        }
      ],
    • "speech_to_text": [
      • {
        • "start": 12.5,
        • "end": 15.8,
        • "text": "Welcome to today's panel discussion on AI safety.",
        • "speaker_id": 0,
        • "language": "en",
        • "confidence": 0.97
        }
      ],
    • "ocr": [
      • {
        • "text": "Q3 Revenue: $4.2M",
        • "confidence": 0.91,
        • "language": "en",
        • "instances": [
          • {
            }
          ]
        }
      ],
    • "content_moderation": {
      • "is_safe": true,
      • "signals": [
        • {
          • "label": "violence",
          • "confidence": 0.82,
          • "instances": [
            ]
          }
        ]
      },
    • "sentiment": {
      • "overall": "positive",
      • "score": 0.62,
      • "instances": [
        • {
          • "start": 0,
          • "end": 45,
          • "label": "positive",
          • "score": 0.71
          }
        ]
      },
    • "topics": [
      • {
        • "name": "artificial intelligence",
        • "confidence": 0.95,
        • "instances": [
          • {
            }
          ]
        }
      ],
    • "brands": [
      • {
        • "name": "Acme Corp",
        • "confidence": 0.88,
        • "instances": [
          • {
            }
          ]
        }
      ],
    • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
    }
}

video-analysis_Job

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the video analysis job.

type
string
Value: "video-analysis-job"

Resource type identifier. Always video-analysis-job.

object (video-analysis_Attributes)

Full attributes of a video analysis job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

object (video-analysis_Result)

Video analysis output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "data": {
    • "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
    • "type": "video-analysis-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "auto",
        • "confidence_threshold": 0.7,
        • "priority": "standard"
        },
      • "result": {
        • "video_metadata": {
          • "duration": 3742.5,
          • "width": 1920,
          • "height": 1080,
          • "frame_rate": 29.97,
          • "format": "mp4",
          • "codec": "h264"
          },
        • "labels": [
          • {
            }
          ],
        • "scenes": [
          • {
            }
          ],
        • "faces": [
          • {
            }
          ],
        • "speech_to_text": [
          • {
            }
          ],
        • "ocr": [
          • {
            }
          ],
        • "content_moderation": {
          • "is_safe": true,
          • "signals": [
            ]
          },
        • "sentiment": {
          • "overall": "positive",
          • "score": 0.62,
          • "instances": [
            ]
          },
        • "topics": [
          • {
            }
          ],
        • "brands": [
          • {
            }
          ],
        • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
        }
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

WebhookPayload

event
required
string
Enum: "transcription.progress" "transcription.completed" "transcription.failed"

The lifecycle event that triggered this notification. transcription.progress: the job is processing; data.attributes.progress is updated. transcription.completed: the job finished successfully; data.attributes.result is populated. transcription.failed: the job encountered an unrecoverable error; data.attributes.error is populated.

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the transcription job.

type
string
Value: "transcription-job"

Resource type identifier. Always transcription-job.

object (transcription_Attributes)

Full attributes of a transcription job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (Input)

Source audio or video file to transcribe.

object (Options)

Optional transcription settings (language, output format, timestamps, diarization, priority). See the Options schema for field-level details.

object (Webhook)

Optional callback URL. When set, the gateway POSTs JSON payloads on each lifecycle transition (transcription.progress, transcription.completed, transcription.failed).

object (Result)

Transcription output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "event": "transcription.completed",
  • "data": {
    • "id": "2f41bc1f-b608-4360-acd9-a26a296fea3c",
    • "type": "transcription-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "en",
        • "timestamps": true,
        • "format": "json",
        • "diarization": false,
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

translation_WebhookPayload

event
required
string
Enum: "translation.progress" "translation.completed" "translation.failed"

The lifecycle event that triggered this notification. translation.progress: the job is processing; data.attributes.progress is updated. translation.completed: the job finished successfully; data.attributes.result is populated. translation.failed: the job encountered an unrecoverable error; data.attributes.error is populated.

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the translation job.

type
string
Value: "translation-job"

Resource type identifier. Always translation-job.

object (translation_Attributes)

Full attributes of a translation job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (translation_Input)

Source content to translate.

object (translation_Options)

Optional settings controlling translation behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

object (translation_Result)

Translation output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "event": "translation.completed",
  • "data": {
    • "id": "9a1bc2f3-d405-4678-bcde-f12345678901",
    • "type": "translation-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "source_language": "auto",
        • "formality": "formal",
        • "priority": "standard"
        },
      • "result": {}
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}

video-analysis_WebhookPayload

event
required
string
Enum: "video-analysis.progress" "video-analysis.completed" "video-analysis.failed"

The lifecycle event that triggered this notification. video-analysis.progress: the job is processing; data.attributes.progress is updated. video-analysis.completed: the job finished successfully; data.attributes.result is populated. video-analysis.failed: the job encountered an unrecoverable error; data.attributes.error is populated.

object

JSON:API-style data envelope.

id
string <uuid>

Unique identifier for the video analysis job.

type
string
Value: "video-analysis-job"

Resource type identifier. Always video-analysis-job.

object (video-analysis_Attributes)

Full attributes of a video analysis job, combining input, job state, and result.

provider
string

Identifier of the backend provider to use for processing this job. When omitted, the gateway selects the most appropriate provider automatically based on the requested features and availability. Use GET /v1/providers to list available providers and their supported features.

status
string (JobStatus)
Enum: "pending" "processing" "completed" "failed"

Current lifecycle state of the job. pending: accepted, waiting to be picked up. processing: actively being worked on. completed: finished successfully. failed: encountered an unrecoverable error.

progress
integer [ 0 .. 100 ]

Processing progress as a percentage. Only meaningful while status is processing.

object (Error)

Details of a job failure. Only present when status is failed.

created_at
string <date-time>

When the job was created.

processed_at
string <date-time>

When the job transitioned from pending to processing. Subtract from created_at to get queue wait time.

completed_at
string <date-time>

When the job reached a terminal state (completed or failed). Subtract from processed_at to get processing duration.

required
object (video-analysis_Input)

Source video and the list of analysis features to run.

object (video-analysis_Options)

Optional settings controlling analysis behaviour.

object (Webhook)

Destination configuration for job lifecycle notifications.

object (video-analysis_Result)

Video analysis output. Populated once status is completed.

object (Meta)

Metadata envelope shared between client and system.

object

Arbitrary key-value data provided by the client. Returned unchanged in all responses.

object

Internal metadata added by the gateway. Not exposed unless explicitly required.

{
  • "event": "video-analysis.completed",
  • "data": {
    • "id": "3e7dc4b2-91f0-4a1e-8c2d-b56789012345",
    • "type": "video-analysis-job",
    • "attributes": {
      • "provider": "twelvelabs",
      • "status": "processing",
      • "progress": 72,
      • "error": {
        • "code": "AUDIO_UNREADABLE",
        • "message": "Could not extract audio from the provided file."
        },
      • "created_at": "2024-03-15T10:00:00Z",
      • "processed_at": "2024-03-15T10:00:05Z",
      • "completed_at": "2024-03-15T10:02:30Z",
      • "input": {},
      • "options": {
        • "language": "auto",
        • "confidence_threshold": 0.7,
        • "priority": "standard"
        },
      • "result": {
        • "video_metadata": {
          • "duration": 3742.5,
          • "width": 1920,
          • "height": 1080,
          • "frame_rate": 29.97,
          • "format": "mp4",
          • "codec": "h264"
          },
        • "labels": [
          • {
            }
          ],
        • "scenes": [
          • {
            }
          ],
        • "faces": [
          • {
            }
          ],
        • "speech_to_text": [
          • {
            }
          ],
        • "ocr": [
          • {
            }
          ],
        • "content_moderation": {
          • "is_safe": true,
          • "signals": [
            ]
          },
        • "sentiment": {
          • "overall": "positive",
          • "score": 0.62,
          • "instances": [
            ]
          },
        • "topics": [
          • {
            }
          ],
        • "brands": [
          • {
            }
          ],
        • "summary": "A panel discussion on AI safety featuring three researchers. The conversation covers alignment challenges, regulatory proposals, and near-term risk mitigation strategies.\n"
        }
      },
    • "meta": {
      • "client": { },
      • "system": {
        • "region": "eu-west-1",
        • "worker_id": "wk_789"
        }
      }
    }
}