Automatic translation and dubbing with AI

Learn how to use AI models to translate or dub a video

Mux already has a feature for creating auto-generated captions. This can produce a transcript for you in the same language being spoken in a video. But what if you want the audio translated (dubbed) into different languages?

We're going to outline a general workflow that you can follow to translate (dub) your videos by grabbing their audio track and sending them to a 3rd party service to handle the translation process. We'll then take the new language tracks and add them back to our video as additional audio tracks.

Mux features used

Workflow

  • Upload a video to Mux that uses the plus video quality level. Basic video quality assets don't support MP4's, and we will need them in the following steps.
  • When uploading your video, make sure mp4_support is set to a value that supports audio renditions, like audio-only, this will make an audio.m4a file available alongside the normal processing done to prepare it for streaming.
  • Wait for the video.asset.static_renditions.ready webhook, which will tell you that the audio only MP4 file is ready to be used
  • Give the audio file URL to an external service, like Sieve, to handle processing.
  • Attach the new audio tracks to your video using the create asset track endpointAPI of the API. This multi-track audio guide goes into more detail in how to do this.

You can then play back your video using Mux Player using the a Playback ID in the same way that you would with any other video. Mux Player will automatically detect the alternate audio tracks and show an audio menu for switching between them.

A complete example

Here’s an example endpoint running on Val.town that puts all of this together. You can give it an asset ID and a list of languages and it will return for you multi-language audio tracks. If you want to experiment with this you can fork it into your own account.

Here’s a video demo of some translations created with this process:

And below is the code from the endpoint:

Considerations

Relying on AI to translate your audio means that you likely won't be manually checking each translation. Because AI isn't 100% perfect, you may want to signify in your app that the translations being displayed are auto-generated, in case of inconsistencies or inaccuracies that are present.

For example: AI may not always understand the contextual meaning of a conversation. It might not always correctly translate jokes, slang, or culturally specific expressions. Depending on the type of content being translated, you may want to add a manual verification step before publishing the translations.

Was this page helpful?