Handling Captions via the YouTube Player API

I rolled out a new version of Able Player over the weekend, and the new version (2.2.1) now supports YouTube captions and subtitles. It was already possible to use Able Player to play YouTube videos, but until now we had relied entirely on YouTube to handle captions on its own, which isn’t necessarily intuitive or convenient for users. If a user has turned captions on via the YouTube website, then they get captions in embedded YouTube players for any video that has captions (automated captions don’t count). Conversely, if they’ve turned captions off on YouTube, they won’t see captions in an embedded player, even if the video has captions available. This is browser-specific, so it only tracks the user’s current caption preference within the current browser.

Ideally, users would have the flexibility to toggle YouTube’s captions on and off and change the language of subtitles from any player, not just on the YouTube website. Based on my reading of the YouTube API Reference I hadn’t thought it was possible to control captions from an external player, but it turns out the YouTube API has undocumented secrets, which I thought I’d document here to save other developers some headaches.

First, here are some relevant links:

Next, here’s what you won’t find in the YouTube API documentation…

Two different modules (captions vs cc)

The API Reference says “Currently, the only module that you can set options for is the cc module, which handles closed captioning in the player.” This is only true if the user is getting the Action Script 3 (Flash) player. If they’re getting the HTML5 player, the API uses the captions module, which is different not only in name, but in its available options. To summarize:

  • cc – captions module for the AS3 player
  • captions – captions module for the HTML5 player

The table below lists the options that each of these modules supports. The API Reference only documents two of these options (fontSize and reload). I’m not sure whether that means the API Reference is out of date or the undocumented options are unofficial and might disappear without notice. Since the latter might be a possibility, you should exercise caution in using these options.

Module Option Description
reload This option is described in the API reference, and presumably behaves the same for both modules, though I didn’t test it.
fontSize Range is -1 to 3. This option is described in the API reference.
cc track In the cc module, the track option returns an object with three properties:

  • languageCode – the ISO 639-1 language code of the current track
  • name – from my experience this is always an empty string (“”)
  • kind – from my experience this is either null or “asr” (for automatic speech recognition). Although the availability of an ASR-generated track doesn’t trigger the loading of the module on its own, it is included as one of the available tracks exposed via the API.
captions track In the captions module, the track option from my experience always returns an empty object
cc tracklist This option returns an array of tracks. Each item is the same as an individual track, as defined two rows up in this table. The only useful and reliable information is the languageCode for each track. This can be used to create a popup menu listing all available subtitle tracks by language. Since the name of each track isn’t reliably exposed, you’ll need to provide your own function that converts language codes to readable language names.
captions tracklist This option returns an array of tracks, but the tracks in the array contain many more properties than the tracks in the cc module:

  • languageCode – the ISO 639-1 language code, same in both modules
  • languageName – a readable name for the track, e.g. “English – captions”. This seems to be reliably populated in the captions module
  • displayName – seems to always be the same as languageName, but is probably the better data to use for a subtitle menu in case the video owner has provided a custom name for a track
  • name – from my experience this is always null
  • kind – from my experience this is always an empty string (“”)
  • is_default – from my experience, this is always null, even for the default track, so unfortunately it’s not a reliable way to check for the default track.
  • id – always null
  • is_servable – always null
  • is_translatable – an integer, from my experience always 1
  • format – an integer, from my experience always 2. Not sure what it means.
captions displaySettings This option returns the following properties:

  • background – a string with a hex color code (e.g., “#000”)
  • backgroundOpacity – an integer (e.g., 1)
  • charEdgeStyle – a string (e.g., “uniform”)
  • color – a string with a hex color code
  • fontFamily – an integer (e.g., 4); would need to be decoded to be usable
  • fontFamilyOption – a string (e.g., “propSans”)
  • fontSizeIncrement – an integer (e.g., 0)
  • textOpacity – an integer (e.g., 1)
  • windowColor – a string with a hex color code
  • windowOpacity – an integer (e.g., 0)
cc displaySettings This option returns all of the same properties as in the captions module, plus nine additional properties. All have values of either true or false:

  • backgroundOpacityOverride
  • backgroundOverride
  • colorOverride
  • fontEdgeStyleOverride
  • fontFamilyOverride
  • fontSizeOverride
  • textOpacityOverride
  • windowColorOverride
  • windowOpacityOverride
captions translationLanguages This option returns an array of 91 languages, each having the following properties:

  • Oid – from my experience, always null
  • id – from my experience, always null
  • languageCode – ISO 630-1 language code
  • languageName – readable name of the language
  • isDefault – from my experience, always false for all 91 languages
captions sampleSubtitle From my experience this option always returns null.

With Able Player, we’re using player.getOptions() to see which module is loaded. Then we use that to control how we go about querying the module for additional information. Unfortunately that means we have to maintain two sets of code, plus we can’t get everything we want from both modules.

Here’s a summary of what I think is especially useful from this data, given the inconsistencies between modules:

  1. The tracklist is useful for knowing what caption or subtitle tracks are available. Both modules provide a languageCode for each track. You can also get a readable name (displayName or languageName) from the captions module, but not the cc module. If YouTube loads the cc module, you have to convert the languageCode to a readable language name yourself.
  2. The track option in the cc module is useful for knowing which track is the default track. Unfortunately there seems to be no way to know this from the captions module. The scaffolding is there: The captions module includes both a track option and an is_default property in the tracklist array; but neither of these items seems to be working. With Able Player, we’re checking for the default track but if it’s still unknown after checking, we default to the language of the web page if there’s a track available in that language, otherwise we default to English.
  3. The displaySettings could be useful if you want to provide a custom interface for displaying captions and subtitles, rather than using YouTube’s interface. Then you could honor users’ font and background preferences. Most of these properties are available in both modules.

Forcing HTML5

Although there’s data in the cc module (AS3 Player) that we’d love to have (e.g., the default track), we still prefer the HTML5 player over the AS3 Flash player. An age-old accessibility problem with Flash is that if keyboard users manage to tab into the Flash object it can be a keyboard trap and they can’t tab back out. This is not a problem in their HTML5 player.

Supposedly HTML5 is now the default player on YouTube, but that’s not consistent with my experience. I think it might be the default for a first-time YouTube user in supporting browsers but if someone has already visited YouTube the previous default (the AS3 player) seems to be saved as their preferred player. To switch players, users still need to visit the YouTube HTML5 Video page.

That said, it turns out there’s an html5 parameter that can be set when the player is initialized that allows developers to force the user to get the HTML5 player if their browser supports it. That’s not documented on the YouTube Embedded Players and Player Parameters reference, but it does work. The only supported value seems to be 1, which has the effect of always using the HTML5 player. Setting it to any other value has no effect.

Unfortunately this doesn’t relieve of us of having to write special code to handle the two different caption modules as explained above, since some users might be using a browser that doesn’t support HTML5, or isn’t one of YouTube’s supported browsers for the HTML5 player, in which case they would presumably get the Flash version despite this parameter being set.

Problem: Modules are exposed onApiChange

As the API Reference explains, the onApiChange event “is fired to indicate that the player has loaded (or unloaded) a module with exposed API methods. Your application can listen for this event and then poll the player to determine which options are exposed for the recently loaded module. Your application can then retrieve or update the existing settings for those options.”

Why is that a problem? Because the onApiChange event never fires until the video starts playing. Therefore it’s impossible to know whether captions or subtitles are available when building the player. With Able Player, our intent is to add a CC button only if captions are available, but we’re unable to do that until the user clicks Play. If a user needs captions and there’s no CC button, I suspect there’s a high probability of their not clicking Play at all.

Our workaround is to autostart the video and play just long enough, then stop it and scrub back to start. But how long is long enough? We can’t wait for the onApiChange event to fire, because if the video has no captions that will never happen. So instead we can listen for the onStateChange event (signaling that the video has gone from an “unstarted” state to a started state (playing or buffering), at which point we immediately stop the video. That brief moment of startedness seems to be enough to trigger loading of the caption API if captions exist, so we can keep listening for onApiChange and setup captions if that event fires, without having played more than a fraction of a second of video. The player initially appears with no CC button, then there’s a brief moment of funkiness as the player plays, then stops, then voila! A CC button is born. This is pretty clumsy and unfortunate, but I think it’s the best we can do at the moment.


Google changes often, so everything you’ve just read is likely to be incorrect any moment now, if it isn’t already.