Experiments With Language and Accessibility

The 2010 FIFA World Cup championship match is only a few hours away. For over a month now, Earth has been buzzing with the ambient hum of vuvuzelas, and football(soccer)-related conversations have spanned the globe in every major language. I watched some of this conversation on Twitter, where at times the #worldcup backchannel was racking up thousands of tweets per minute, in a wide variety of languages. The Twitter Search results page includes a link to "Translate to English", which uses the Google AJAX Language API to translate the content of any non-English tweets.

Unfortunately Twitter does not use lang attributes. For multilingual screen reader users who are following multilingual Twitterers, lang attributes are critical. Without them, screen readers grossly mispronounce foreign language content, as demonstrated on my Lang Attribute Tests page.

Many screen readers support multiple languages, and can switch on-the-fly between the pronunciation engines for those languages, but it's up to web developers to provide the necessary cues. This is really a trivial task. Just mark up the overall document with a lang attribute indicating the language of the document, like this example, in which the language is English ("en"):

<html lang="en">

If a document that is primarily in one language contains content in another, just mark up the foreign language content like this:

<p lang="es"> una cerveza más, por favor </p>

Trivial, but Twitter doesn't do it. And even more shockingly, Google Translate doesn't do it either! This makes no sense to me. When the Google AJAX Language API detects the language of a block of text, it actually returns in its results array the standard two-character code representing the detected language. All Google, or Twitter, or anyone else using the API, would need to do at this point is add lang attributes to the page content, populated with the language values from the language detection results.

To prove how simple this is, I decided to try it myself. The result is The Accessible Twitter Translation & Lang Experiment (TATTLE).

Here's how this page works in a nutshell:

  1. It retrieves the most recent #worldcup tweets from Twitter in the form of an RSS feed
  2. It displays the content of each tweet, but since Twitter doesn't provide any language information there are no lang attributes initially.
  3. When the page is loaded, a script is executed that retrieves the content of each tweet from the DOM and sends it to the google.language.detect service, which returns a result that includes a two-character language code. The script adds a lang attribute to the tweet content container, and assigns the Google-provided language code as the value of the new attribute. Voila! Now screen readers that support the various languages on this page will correctly pronounce them.
  4. The script also adds a select field near the top of the page that allows users to select their preferred output language, and adds a link to each tweet that allows users to translate the content of that tweet.
  5. If users select any tweet-translation link, the content of that tweet is sent to the google.language.translate service, which translates the content into the user's preferred output language, displays the translated text with an appropriate lang attribute, and sends focus to the div containing the new translation so screen reader users can access it immediately.

Conclusion: Yes - this is a fairly trivial task. I'm all thumbs when it comes to Javascript and I had this up and running within a couple of hours. Ironically, Google provides us with everything we need to add lang attributes to any content. Now if they would only use it on their own Google Translate site!

This may seem like a lot of fuss over a minor issue, but it really can have a huge impact on multilingual people who use audible web interfaces. I first became aware of the magnitude of the problem when I was asked to help a blind student who was taking an on-line college Spanish course. The course had hundreds of bilingual pages with no lang attributes. Until this problem was fixed the student was being asked to learn Spanish from grossly mispronounced course content.

As our society has become increasingly global, our communications have become increasingly multilingual, and what's at stake here is not just one student's ability to take Spanish 101. Missing lang attributes can have a significant impact on screen reader users' ability to fully participate in our global society. So as we work to make international communication accessible to Everyone via language detection and translation, we need to remember that Everyone includes people using audible web interfaces, and take the extra step (it's just one simple step!) to ensure accessibility for these folks.

Comments are closed.