When HTML5 was published, it introduced the <video> and <audio> elements, as well as the <track> element. The latter provides a standard means of synchronizing text with media for a variety of purposes. The HTML5 spec specifically defines five kinds of track: captions, subtitles, chapters, metadata, and descriptions. The latter is particularly interesting, and is the topic of this post.
Description, historically known as "audio description" or various other terms, is a supplemental narration track designed for people who are unable to see the video. It describes content that is presented visually, and is needed if that content is not otherwise accessible via the program audio. Historically we've outsourced description to professionals, but with prices starting at $15 per video minute, we've never gotten the kind of participation we need if video everywhere is going to be fully accessible.
With HTML5, description can now be written in any text editor. All five finds of tracks, including description, are written in the same file format: WebVTT, which is essentially a time-stamped text file. Imagine that you have a video that beings with a short interview with someone notable, say the president of your university. The president's name and affiliation appears visually in an on-screen graphic but they're never specifically identified in the program audio, so a non-visual person has no idea who's speaking and whether to take them seriously. This is a really simple use case, but common among videos I see in higher education: The video can easily be made accessible by creating a WebVTT file that includes the speaker's name and affiliation, with start and end times indicating when this text should be spoken. There's a bit of thought that must go into timing, as you want to avoid colliding with other important audio, but otherwise it's a really simple task. The result would be a file that looks something like this:
00:00:05.000 --> 00:00:08.000
Ana Mari Cauce, President,
University of Washington
Save that as a VTT file. Done. Thirty seconds and your video has description.
With text-based description, it's easy to edit the content or the timing (not true if the description narration is mixed into the video). Plus computers can read it themselves; we don't need to hire professional voice actors to do that. This makes description affordable, easy to do for most of the videos we're using in academia, and increases the likelihood that video owners will actually do it.
But how can text-based description be delivered to users?
This post comes live from the National Student Electronic Media Convention, the annual fall convention of College Broadcasters, Inc (CBI). It's in Seattle this year, and I was invited to present on web/media accessibility along with folks from WKNC, North Carolina State University's student-run radio station. This is a very cool coincidence, since my co-presenters weren't even aware that I was once employed by NC State, and was a regular listener to WKNC in those days.
In fact, I'm a huge fan and supporter of college radio! Two of the six stations on my car radio dial are KEXP (UW) and KUGS (WWU), and before moving to Washington I was a regular listener of WKNC (as I mentioned) and before that, KJHK, "The Sound Alternative" in Lawrence, Kansas. Also, as an independent musician, the only radio stations that have ever given my music air time are college stations.
To prepare for my presentation, I thought I would do a quick informal assessment of the state of accessibility on college radio station websites. Hence, this blog post.
In 2017, I and a small group of colleagues collaborated on a series of accessibility workshops that we delivered as pre-conference sessions at three national conferences, AHEAD, EDUCAUSE, and Accessing Higher Ground. If you were a participant in any of these workshops, you're about to receive a follow-up survey. This blog post documents my quest for an online tool for conducting the survey. My #1 criterion for choosing a tool is whether the tool generates accessible output. My #2 criterion is whether the tool is accessible to survey authors with disabilities, but I didn't specifically evaluate that for this blog post.
To keep things simple, I tested only one question type: Multiple choice with radio buttons.
The first question on my survey is this: "Where did you attend our accessibility workshop?" There are three possible answers: Accessing Higher Ground, AHEAD, and EDUCAUSE. Users are required to select one of the answers.
For this to be fully accessible to screen reader users, the following information should be communicated via their screen reader:
That the field is required
The current state of each radio button ("checked" or "not checked")
The number of options, and the user's position within those options (e.g., "2 of 3")
If I were to hand-code the survey from scratch using standard HTML markup, my code would look something like this:
Here again are the five requirements for full accessibility, with a brief explanation of how each is attained using the above markup.
1. Each answer
The label associated with each radio button (e.g., "Accessing Higher Ground") has a <label> element with a for attribute, the value of which matches the id of the radio button <input> element. This explicitly associates the label with its matching radio button. Screen readers announce the matching label when a radio button has focus, and mouse users and touch screen users can click or tap anywhere on the label to select that button (more convenient since it's a larger target than the button alone).
2. The question
Web developers often make the answers accessible, but often overlook the question. And users should never answer "Yes" if they don't know what they're agreeing to! The standard method for making the question accessible is to wrap the question in a <legend> element, then wrap that plus the group of radio buttons inside a <fieldset>. With this markup, screen readers announce both legend and label when a radio button receives focus. They differ in their implementation of this. Some screen readers announce both the legend and label for each button as the user navigates between the buttons; other screen readers announce the legend only once (when the first button in the group receives focus). They assume that's enough, and on subsequent buttons they just announce that button's label.
3. That the field is required
The required attribute was introduced in HTML5. The proper technique for using it with radio buttons is described in the HTML 5.2 spec, Example 22.
To paraphrase: It's only required on one of the radio buttons in the group, but authors are encouraged to add it to all radio buttons in the group to avoid confusion.
4. The current state
If the radio button is correctly coded as a radio button, all screen readers automatically announce whether the current radio button is "checked" or "not checked".
5. Position within the Total
If the group of radio buttons is coded correctly, all screen readers will announce something like "2 of 3". One exceptions is JAWS in Internet Explorer 11, but this is probably an IE issue, as JAWS does announce this information in Firefox (tested using JAWS 2018).
How Screen Readers Render Standard HTML
Putting all the pieces together, screen readers typically announce the following information when the first radio button in a group receives focus:
What this is, i.e., "Radio button"
The label for the button
The question (e.g., legend)
The current state ("checked" or "not checked")
Position within the total (e.g., "1 of 3")
Screen readers vary on the sequence of these items. Also, as noted above, screen readers vary on whether they continue to announce the legend for each button as the user navigates through their choices.
I created a simple survey with one required question using the following tools:
Then I tested the output with keyboard alone, NVDA 2017.4 in Firefox 57, JAWS 2018 in Firefox 57 and Internet Explorer 11, VoiceOver in Safari on MacOS Sierra, and VoiceOver in Safari on iOS 11 (using an iPhone X).
The following sections show the code generated by each of the survey tools, edited to just show the relevant markup for accessibility. All tools add a lot of extra <div> and <span> elements plus class attributes to help with styling, but these have little or no impact on accessibility and have been removed here for readability. Also, each of the tools auto-generates name and id attributes - I've edited all those so they match my original example.
Do you have control or influence over the design of one or more web pages? If so, then I encourage you to add this New Years resolution to your list: Underline your links!
Since the dawning of the Web, browsers have underlined links so users could distinguish link text from surrounding text. In fact, all major browsers still do this by default. Ever wonder why? Answer: Because it continues to be an effective way to communicate "this is a link".
Unfortunately there's been a growing trend over the last few years among web designers to remove underlines from links, relying on color alone to distinguish link text from surrounding text.
At CSUN 2017 I opened the conference on Wednesday morning with a presentation on audio description. The purpose of my presentation was to muse about how organizations with large quantities of videos might meet Success Criterion 1.2.5 of the W3C Web Content Accessibility Guidelines (WCAG) 2.0:
1.2.5 Audio Description (Prerecorded): Audio description provided for all prerecorded video content in synchronized media. (Level AA)
The purpose of audio description is to ensure that visual content is accessible to people who can't see it. In some cases the information is sufficiently communicated via the program audio. However, when that isn't the case, a supplemental audio track must be provided that includes brief description of the visual content.
The WCAG 2.0 accompanying recommendations for How To Meet SC 1.2.5 includes several "Sufficient Techniques" for accomplishing this, all of which focus on providing a second, user-selectable, audio track or movie that has human-narrated audio descriptions mixed in.
The recommendations also include an "Advisory Technique", Using the track element to provide audio descriptions. This is the technique supported within HTML5, using the <track> element with kind="descriptions" (more on this below). This is presumably an "Advisory Technique" because it isn't well supported yet by media players. However, I'm convinced that this technique has merit and is more scalable than any of the "Sufficient Techniques" for describing tens of thousands of videos, which is the scale of the problem at most universities.
In my presentation at CSUN, and in this follow-up blog post, I took a closer look at the two methods.