Skip to content

Extracting subtitles and captions from video files with FFmpeg

Extracting subtitles or captions from a video can be helpful for things like accessibility and localization. Using FFmpeg you can easily extract subtitle tracks from a video file in a variety of formats. In this article we will demonstrate different methods for extracting subtitles and show how to work around subtle gotchas in text formatting.

LinkWhy Extract Subtitles?

There are several reasons you may want to extract subtitles:

  • Localization: When translating a video, having the subtitles as a separate file is easier for editors to work with.
  • Accessibility: Adding subtitles to different platforms or reformatting them for screen readers often requires them in a separate file.
  • Automation: If you're using AI tools to do sentiment analysis or things like automatic translation then you'll likely want to automate extracting the subtitles or captions.

LinkTypes of Subtitles in Video Files

Videos can contain different types of subtitles and only some of them are easily extracted:

LinkEmbedded subtitles

Subtitles embedded as part of the video file. These can be extracted without altering the video at all because they are stored as a text track alongside the other media tracks inside the video container. Because they are independent tracks extracting them is usually faster.

LinkSoft subtitles

These are stored in a separate file alongside the video file and so are already separate from the video itself.

LinkHardcoded subtitles

These are burned directly into the video and can’t be extracted as a separate file. This means that they are literally part of the video image and don't exist as text in any parsable way. Extracting these types of subtitles would requires analyzing the frames themselves and attempting to convert the image into text. This is not something we'll cover in this article.

We'll be extracting embedded subtitles with the examples below.

LinkThings to consider before extracting subtitles and captions

Videos can have multiple subtitle streams, each in a different language or format, so you’ll need to identify the stream that needs extracting and it's ID within the list of streams in the file. Running FFprobe against the file like this: ffprobe my-file.mp4 would output information about what streams are available and their respective ID's.

Some subtitles might use special character sets also, so it’s best to specify encoding where needed, especially with non-Latin languages. We'll show an example later on for how to do this.

LinkHow to Extract Subtitles and captions with FFmpeg

LinkIdentifying which streams to extract

To identify the subtitle streams in a video, run:

bash
ffprobe video.mp4

This command lists all streams within the file, including video, audio, and subtitle streams. Subtitle streams will be marked as Stream #0:x, where x is the stream index.

LinkExtracting subtitles to an SRT File

Once you know the subtitle stream index, you can extract it like this:

bash
ffmpeg -i video.mp4 -map 0:s:0 subtitles.srt

-map 0:s:0 specifies the subtitle stream index. The first 0 identifies the input file, which will always be 0 when working with a single input. s selects subtitle tracks and the last 0 identifies which subtitle stream ID to select for extraction.

LinkExtracting subtitles to VTT (WebVTT)

bash
ffmpeg -i video.mp4 -map 0:s:0 subtitles.vtt

LinkExtracting subtitles to ASS (Advanced SubStation Alpha)

For more complex styling and positioning:

bash
ffmpeg -i video.mp4 -map 0:s:0 subtitles.ass

LinkDealing with Character Encoding

If the subtitle file’s encoding doesn’t render correctly, specify the character set with -sub_charenc. For example, to handle UTF-8:

bash
ffmpeg -sub_charenc UTF-8 -i video.mp4 -map 0:s:0 subtitles.srt

LinkAutomating Subtitle Extraction for Multiple Streams

If a video has multiple subtitle streams, you can extract each with a loop in a bash script like this:

bash
for i in $(ffprobe -v error -select_streams s -show_entries stream=index -of csv=p=0 video.mp4); do ffmpeg -i video.mp4 -map 0:s:$i subtitles_$i.srt done

This command finds each subtitle stream, extracting them sequentially into separate .srt files.

LinkTaking it further

Here's some more articles that you may find helpful for doing common tasks with FFmpeg:

No credit card required to start using Mux.