Identify the most compelling moments in a video using the Mux Robots API.
Identify the most compelling moments in a video. This workflow analyzes both audio and visual content to find segments that stand out for their hook strength, clarity, emotional intensity, novelty, or soundbite quality. It's useful for generating highlight reels, social media clips, or preview content. See the Find Key Moments API referenceAPI for the full endpoint specification. See Mux Robots pricing for unit costs.
find-key-moments jobcurl https://api.mux.com/robots/v0/jobs/find-key-moments \
-H "Content-Type: application/json" \
-X POST \
-d '{
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"max_moments": 5
}
}' \
-u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}This request is asynchronous. The POST returns immediately with the job in pending status and does not include results. We strongly recommend listening for the robots.job.find_key_moments.completed webhook — the payload contains the full completed job, so no follow-up API call is needed. If webhooks aren't an option, you can poll GET /robots/v0/jobs/find-key-moments/{JOB_ID} with the id from the response until the status is completed.
Key moment extraction uses transcript cues from the asset to identify compelling segments. Make sure your asset has captions, either auto-generated or manually added, before creating a find-key-moments job.
| Parameter | Type | Description |
|---|---|---|
asset_id | string | Required. The Mux asset ID of the video to analyze. |
max_moments | integer | Maximum number of key moments to extract (1-10). Defaults to 5. |
target_duration_ms | object | Preferred highlight duration range in milliseconds. Both min and max are required when provided. |
target_duration_ms.min | integer | Required. Preferred minimum highlight duration in milliseconds. |
target_duration_ms.max | integer | Required. Preferred maximum highlight duration in milliseconds. |
output_steering | object | Curated controls that guide moment selection, titles, audience, and concepts without changing the output schema. See Output steering. |
Use output_steering when you want best-effort control over which moments are selected and how they're described. These fields guide the workflow but do not guarantee exact output.
| Field | Type | Description |
|---|---|---|
selection_strategy | string | Preferred definition of a strong standalone moment. Supported values: standalone_hooks, educational_takeaways, story_beats, product_moments, and speaker_highlights. |
title_style | string | Preferred style for generated moment titles. Supported values: descriptive, punchy, educational, and social. |
audience | string | Intended audience used to guide moment selection and titles. |
brand_terms | array of strings | Preferred brand or domain terms to use when supported by the source content. |
rubric_priorities | array of strings | Up to 4 rubric dimensions used as tie-breakers after applying the selection strategy. Supported values: clarity_in_isolation, emotional_intensity, novelty, and soundbite_quality. |
topic_taxonomy | object | Controlled vocabulary used to steer notable audible concepts without changing the response schema. |
topic_taxonomy.name | string | Optional customer-facing name for the taxonomy. |
topic_taxonomy.values | array | Controlled vocabulary values. Each value has a required label and optional description and aliases. |
topic_taxonomy.allow_other | boolean | When true, non-taxonomy values may be used when no taxonomy value applies. |
{
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"max_moments": 5,
"output_steering": {
"selection_strategy": "standalone_hooks",
"title_style": "social",
"audience": "developers scrolling a social feed",
"brand_terms": ["Mux Video", "Mux Data"],
"rubric_priorities": ["soundbite_quality", "emotional_intensity"],
"topic_taxonomy": {
"name": "Themes",
"values": [
{
"label": "Video as data",
"description": "Treating video content as structured, queryable information",
"aliases": ["structured video", "queryable video"]
},
{
"label": "Developer experience"
}
],
"allow_other": true
}
}
}
}The outputs object is included in the job once its status is completed. You'll receive it on the robots.job.find_key_moments.completed webhook (recommended), or you can fetch it with GET /robots/v0/jobs/find-key-moments/{JOB_ID}. It contains:
| Field | Type | Description |
|---|---|---|
moments | array | Extracted key moments, ordered by position in the video. |
moments[].start_ms | number | Moment start time in milliseconds. |
moments[].end_ms | number | Moment end time in milliseconds. |
moments[].overall_score | number | Weighted quality score (0.0-1.0) based on hook strength, clarity, emotional intensity, novelty, and soundbite quality. |
moments[].title | string | Short catchy title for the moment (3-8 words). |
moments[].audible_narrative | string | One-sentence summary of what is being said. |
moments[].notable_audible_concepts | array | Key audible concepts (2-5 word phrases). |
moments[].visual_narrative | string | One-sentence summary of what is visually happening. Present for video assets only. |
moments[].notable_visual_concepts | array | Scored visual concepts extracted from sampled frames (video assets only). Each has concept, score, and rationale. |
moments[].cues | array | Contiguous transcript segments with start_ms, end_ms, and text. |
This is the payload delivered to the robots.job.find_key_moments.completed webhook, and the same shape you get from GET /robots/v0/jobs/find-key-moments/{JOB_ID}:
{
"data": {
"id": "rjob_mno345",
"workflow": "find-key-moments",
"status": "completed",
"units_consumed": 1,
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"max_moments": 3
},
"outputs": {
"moments": [
{
"start_ms": 12400,
"end_ms": 28900,
"overall_score": 0.92,
"title": "The Future of Video Data",
"audible_narrative": "The speaker explains how AI transforms video from passive content into structured, queryable data.",
"notable_audible_concepts": ["video as data", "AI transformation", "structured information"],
"visual_narrative": "The speaker gestures at a diagram showing video processing pipeline stages.",
"notable_visual_concepts": [
{ "concept": "pipeline diagram", "score": 0.87, "rationale": "Directly illustrates the concept being discussed" }
],
"cues": [
{ "start_ms": 12400, "end_ms": 16200, "text": "What's exciting is that video isn't just content anymore." },
{ "start_ms": 16200, "end_ms": 22100, "text": "Every video you upload is a dataset waiting to be queried." }
]
}
]
}
}
}