My Post-CSUN Comparison of Web Accessibility Checkers

Two weeks have now passed since the 2016 CSUN Conference, and I'm still inspired by many of the bright ideas that were generated from sessions, conversations, tweets, etc. and considering how to apply them.

I gave two sessions at CSUN, What's New With Able Player? and Web Accessibility 101 with Accessible University 3.0. In the second of these session, I modeled how to use our Accessible University (AU) demo site in an interactive training session on web accessibility. The AU site consists of three core pages: A "before" page, with at least 18 accessibility problems, an "after" page, with those problems fixed, and an intermediary page that describes the problems and solutions.

One of the sessions I intended to attend but was locked out due to a capacity crowd was Luis Garcia's Automated Testing Tool Showdown. Fortunately Luis shared his slide deck. After looking over his findings, I found myself wondering how the various accessibility checkers would do with a web page like AU's "before" page, the page with at least 18 known accessibility problems. I decided to find out.

The Tools I Tested

Below are the tools I set out to test, listed in alphabetical order. This includes all the tools Luis Garcia tested, plus a few others.

  1. AInspector Sidebar - from the OpenAjax Alliance; essentially the Functional Accessibility Evaluator (FAE) 2.0 service provided as a Firefox plugin.
  2. aXe - from Deque Systems, tested using the aXe Developer Tools extension for Firefox
  3. Bobby - the original web accessibility checker, from CAST
  4. Cynthia Says - free accessibility checker from Cryptzone (formerly HiSoftware)
  5. Google Accessibility Developer Tools - A Chrome extension from Google Accessibility
  6. HTML CodeSniffer - tested using the Accessibility Auditor Bookmarklet, and verified via the command-line interface of pa11y ("your automated accessibility testing pal")
  7. QUAIL - jQuery accessibility plugin (see also QUAIL on GitHub).
  8. SiteImprove - a commercial website quality and compliance checker, included here because the UW is a subscriber so I have easy access
  9. Tenon.io - from Karl Groves and the gang
  10. WAVE - "Web Accessibility Evaluation Tool" from the web accessibility gurus at WebAIM.

Ok, I didn't really test with Bobby

Bobby is the original web accessibility checker, created by CAST and launched in 1995, but was officially closed a decade later, in 2005. The accessibility engine lived on for a while in tools by Watchfire, then IBM, but there is no more Bobby. So if you see claims of "Bobby-compliance", you know the claimant isn't really up on accessibility. These claims are still out there though.

Bobby Approved logo

The Bobby home page is still available via the Internet Archive's Wayback Machine. However, it's not a functional service. I did try though.

Ok, I didn't test with Cynthia Says either

I did run the page through Cynthia Says. However, when I tried to sift through the report (a tree with expandable/collapsible sections) I found it to be an incredibly painful experience for two reasons:

  1. Tiny fonts. The default font size in Firefox is 16 pixels. The calculated font size of text in the Cynthia Says report ranges between 6.8 to 8.8 pixels (less than half the default size). I couldn't see the results at all, no matter how hard I squinted. I could increase the font size and it scaled reasonably well, but I was a little miffed that this was necessary. What finally stopped me in my tracks though was the second problem...
  2. Truncated error messages. The top level in the tree is WCAG 2.0 success criteria, labeled with the criterion number plus a human-readable word or phrase (e.g., "Criterion 1.3.1 [Info and Relationships]"). This may or may not be intuitive to users who are not already WCAG scholars. However the bigger problem occurs as users drill into these categories, and find issues such as "Use the title attribute to identify form contro...", "Ensuring that the Web Page contains another CAP...", and "Failure of Success Criterion 1.1.1 and 1.2.1 du...". Given the tiny font, and the fact that the report spans the full width of the web page, I'm stumped as to why they feel a need to truncate these messages. These truncated messages are never completely spelled out, not after the item is expanded, nor as a title attribute. Consequently I found it very challenging to make any sense of the report, and ultimately gave up out of frustration.

Ok, I didn't test with QUAIL either

I actually really wanted to test with QUAIL. First, I'm a big fan of recursive acronyms (QUAIL = "QUAIL Accessibility Information Library"). Second, I've seen it in action and find it to be a very thorough and useful tool. It is most commonly used as an accessibility checker that's integrated into content management systems (e.g., the Accessibility Drupal Module and CKEditor Accessibility Checker plugin). I had some initial trouble building the software, but fortunately the problem I was experiencing was a known issue so I was able to eventually get beyond that and attempt a test from the command line with a slightly less-than-current version. However, both of my attempts resulted in PhantomJS crashing, so I ultimately decided it wasn't meant to be. Despite my failure, I suspect it would perform pretty well as it has over 250 rules, which are well documented in the QUAIL documentation.

The Eighteen Issues

The eighteen known problems I checked against are described on the AU: List of Accessibility Issues page. Here's a short summary of each:

  1. No headings: The page has clear visible headings, but no HTML heading markup.
  2. No alt text: There are several informative images with no alt attributes.
  3. Bad alt text: There are three decorative images with alt="horizontal line graphic".
  4. Color contrast: The navigation menu has poor color contrast.
  5. Dropdown menu: The dropdown menu has many problems, the greatest being that the submenus are hidden from everyone (including screen reader users), and can only be revealed by hovering with a mouse.
  6. Visible focus: There is no visible focus indicator for links. In fact, the browser's default indicator has been overridden in CSS.
  7. Link text: There are three instances of "click here" as link text.
  8. Color used for information: The application form says "required fields are in blue". Also, hyperlinks in the main body are not underlined, therefore only distinguishable as links by color.
  9. Language: There is no lang attribute that identifies the language of the page as English. There is also a block of Spanish content that has no lang attribute identifying it as Spanish.
  10. Form fields: There are no <label>, <fieldset>, or <legend> elements in the application form.
  11. CAPTCHA: The application form includes a visual CAPTCHA.
  12. Form validation: If the user submits the form with errors, an error message appears at the top of the form that says "Your form has errors. Please correct them and resubmit." This error message provides insufficient help to the user, and is not coded in a way that exposes it to screen reader users.
  13. ARIA Landmarks: There are no ARIA landmark roles on this page.
  14. Modal dialog: One of the "click here" links triggers a modal dialog, which is not truly modal. It fails to trap keyboard focus, cannot be closed by pressing the escape key, and is not easily accessible to screen reader users.
  15. Carousel: Carousels are complex widgets with many parts. The carousel on this page is a fairly typical carousel with many problems: Keyboard users are unable to access all components; screen reader users are unable to operate the controls or access and understand all content; and people who are distracted by movement or who need more time to read the content are unable to pause the animation.
  16. Data table: The page includes a data table that does not use <th> elements to identify row and column headers.
  17. Abbreviations: The data table includes several abbreviations that could be difficult for all users to understand, including at least one ambiguous one ("ECO" could be "Ecology" or "Economics"). Providing a mechanism for expanding abbreviations (e.g., the <abbr> element) is only a Level AAA success criterion in WCAG 2.0, but in this case would be extremely helpful.
  18. HTML validation: Validation of this page yields five errors, all related to images with no alt text.

Regarding that last item, none of the accessibility checkers tested specifically include HTML validation checks, but I didn't expect them to. That's what the W3C Markup Validation Service is for. Also, since all the validation errors are related to images with no alt text, all accessibility checkers will likely catch this and other explicitly measurable accessibility-related errors, even if not specifically checking for HTML validation.

Results

Issue AInspector aXe Google HTML CodeSniffer Siteimprove Tenon WAVE
1. No Headings Yes No No Partial Yes No Yes
2. No alt text Yes Yes Yes Yes Yes Yes Yes
3. Bad alt text (decorative) No No No Sort of No No Yes
4. Insufficient color contrast Yes Yes Yes Yes Yes No Yes
5. Inaccessible dropdown menu No No No No No No No
6. Insufficient visible focus Sort of No No No No No No
7. Redundant, uninformative link text Yes No Yes Yes Yes Yes Yes
8. Color used to communicate information Sort of No No No Yes No No
9. Language not specified Yes Yes Yes Yes Yes Yes Yes
10. Missing accessible form markup Yes Yes Yes Yes Yes Yes Yes
11. Inaccessible CAPTCHA No No No No No No No
12. Inaccessible form validation Sort of No No No No No No
13. Missing ARIA Landmarks Yes No No No No No No
14. Inaccessible modal dialog Sort of No No No No No No
15. Inaccessible carousel Sort of No No No No No No
16. Missing accessible table markup Sort of Yes No Sort of No Yes Sort of
17. Missing abbreviation tags No No No No No No No

Notes

  • All but one of the tools that returned errors related to headings simply detected the absence of headings. HTML CodeSniffer is a bit different: It returns a heading error if entire lines of text are marked up with <b> elements (maybe <strong> too though I didn't test that). However, large bold text that's stylized using CSS does not return the same error.
  • WAVE is the only tool that detected "suspicious alternate text" on the three decorative images.
  • Karl Groves of Tenon says he's "in the process of adding 63 new tests and revising nearly all of the others". Included among new tests is a test for at least one heading and a contrast checker. So expect a few more "Yes" cells in the Tenon column in the near future.
  • No tools specifically identified the problems with the dropdown menus, although some tools identify relevant issues such as keyboard operability as a "manual check". These issues appear on all pages though, and are not customized for the current page.
  • AInspector Sidebar includes a manual check that says "Focus must be visible", but it does this for all pages.
  • AInspector includes a manual check for "Use of color". However, the only tool that recognizes a specific problem with the non-underlined link text is SiteImprove.
  • Although all tools return an error for missing document language, none recognize this problem for the block of Spanish text. AInspector Sidebar does include "Identify language changes" as a manual check.
  • Although all tools return an error for missing labels on form fields, most oversimplify the solution. HTML CodeSniffer and WAVE both decribe various methods for labeling form fields in their accompanying help text (e.g., title, aria-label, and aria-labelledby). However, AInspector Sidebar is the only tool that delves into even greater depth and explains the need for fieldset and legend.
  • Although I gave up early on sifting through the Cynthia Says results, it does deserve credit for being the only tool that recognized the presence of a CAPTCHA on the page. It turns out the truncated message mentioned above "Ensuring that the Web Page contains another CAP..." is actually about CAPTCHA. When I expanded this item, I found the following sub-item: "CAPTCHA found, please verify that the information is conveyed through audio as well as visual." I'm not satisfied with this solution, as it's overly simple and fails to address the needs of users who are deaf-blind. Still, it's impressive that the tool recognized the CAPTCHA at all, whereas all other tools only identified the CAPTCHA as an image that needs alt text.
  • No tool specifically detected problems with the methods used for reporting form validation errors to users. However, AInspector Sidebar has a wide variety of relevant issues that appear under "manual check".
  • AInspector is the only tool that returns errors for missing ARIA landmark roles. It returns three violations: "All content must be contained within landmarks", "MAIN landmark: At least one", and "NAVIGATION landmark: At least one". Some may dispute whether these are valid rules.
  • All tools recognized the need for alt text on the images in the carousel. However, the images are only one small problem in the overall carousel interface. No other tools identified the greater issues, nor should we expect them to. AInspector Sidebar comes closest: It returns an error "Keyboard/mouse/drag events must have a role" on all the navigation controls within the carousel, and the accompanying help text for this issue provides additional details about the need for ARIA markup.
  • Tools that received a "Sort of" related to the data table issue assumed the table was a layout table since it doesn't have <th> elements. AInspector didn't make that assumption but still, its error "Identify table markup as data or layout" is similar. The two tools that received a "Yes" (aXe and Tenon) assumed (correctly) that this was a data table, and was missing <th> elements. Whether this was just a lucky guess would require further testing. However, it's plausible to automatically determine with some confidence whether a table is a layout or data table (screen readers do this fairly well).
  • The need for clarifying abbreviations (e.g., with the <abbr> element) is only a Level AAA WCAG 2.0 success criterion so one can perhaps forgive all tools for failing to detect this issue. However, several of the tools claim to support Level AAA yet still didn't detect this issue. This is an instance where I wish I had been successful at running a QUAIL test, because I know QUAIL has tests for abbreviations.

Conclusion

Ultimately I think none of these details are really all that important. More important is whether the tool's user interface, its feature set, and its accompanying help text is effective in helping designers and developers to improve the accessibility of their websites. The tools I tested have widely different philosophies about how to accomplish that. AInspector Sidebar/FAE seems to be the most comprehensive. Its combination of violations, warnings, and manual checks covers a wide swath of web scenarios, and its help text provides more specific technical detail, including ARIA techniques, than any other tool. If users fully utilize all of this information they will likely be much better at developing accessible web applications. The question is: Is it too much? Other tools, most notably aXe and Tenon, are developed with the belief that too much information can be counterproductive. They focus on the low-hanging fruit, testing issues that can be accurately measured without overwhelming the user with false positives, duplicates, and the need to (as stated on the aXe website) "wade through reams of accessibility issues that have nothing to do with the feature you are developing".

Way back in 2002 (when Bobby was still a thing), Melody Ivory and Aline Chevalier published A Study of Automated Web Site Evaluation Tools (PDF). In their study, they asked experienced developers, most of whom had little exposure to web accessibility, to fix accessibility of five websites, using various combinations of automated assistance (no assistance, or with either the W3C HTML Validator, Bobby, LIFT (another defunct tool that was popular back then), or all three). They found that the automated evaluation tools identified more errors than designers identified without using them, but designers made more page modifications when not using the tools (they had a time limit). Also the modifications they made without the automated tools resulted in better user performance than modifications made using any of the automated tools (user performance was measured subsequently with a sample of users with and without disabilities). Granted, this was 2002. Accessibility checkers have come a long way, as have web design and development techniques. Today's developers are accustomed to using a variety of tools, and nearly all of the checkers I tested have APIs that enable them to be seamlessly integrated into developers' existing toolsets. It would be interesting to replicate this earlier study with a more contemporary set of tools, larger sample of developers, and better controls for prior accessibility knowledge and experience. To my knowledge nobody has done such a study.

Meanwhile, I think web accessibility checkers have a place in our overall accessibility strategy, but perhaps it's a relatively minor place. We certainly should not depend on them. In order to truly understand the issues surrounding the web pages we're developing today, which include dropdown menus, carousels, and other complex interactive widgets, there's no substitute for designers and developers learning all they can about accessibility techniques. Automated accessibility checkers can help a little with that, but we need to appreciate their limitations.

6 comments on “My Post-CSUN Comparison of Web Accessibility Checkers

  1. Pingback: The importance of manual testing alongside automated accessibility tools – Media Access Australia

  2. Simone, I was limited partly by time and partly by the desire to keep the number of columns in the data table reasonably small. Otherwise, I have no particular reason for excluding AChecker, or any other checker that I overlooked. It just wasn't at the forefront of my consciousness that day. I've used it in the past though, and it's a fine checker.

  3. This is a great analysis. I thought I'd provide some commentary on some of the "No" elements, at least in regard to WAVE.

    Because of browser security limitations, it's nearly impossible to evaluate the focus style states of links. This makes detecting missing focus indicators very difficult - unless you were to parse the CSS of a page, which is very difficult and expensive - and would likely result in false errors.

    Determining whether color is used to convey content is also very difficult to evaluate. One could only identify all instances of color and direct the user to evaluate these. And seeing as color is already readily identifiable, having a tool indicate this would just result in unnecessary overhead.

    A test to identify Re-CAPTCHA (or other popular CAPTCHAs) would be possible and we'll consider this. Of note is that the AU site doesn't use a standard CAPTCHA implementation, so such a test would not have flagged this anyway.

    Inaccessible form validation, modal window, and carousel are nearly impossible to automatically detect. And it would be difficult for a tool to differentiate accessible versions from inaccessible ones. I'd be happy to hear feedback on how you think this could be done in a meaningful way.

    Missing ARIA landmarks is an interesting proposition. We'll consider alerting (not a failure) the user to pages that have no ARIA landmarks or HTML5 structural elements (similar to how we alert if there is no h1). It would be rare to have a page that wouldn't benefit from these.

    Because abbr has insufficient screen reader support, is not keyboard or touch accessible, and when supported creates an inequivalent experience for screen reader users, we do not generally advocate it as the best method for identifying acronyms and abbreviations. For a tool to suggest this would, I believe, result in a less accessible experience.

    In conclusion, what you identify as potential deficiencies in tool are either not readily possible in an automated way, or are perhaps conscious decisions to ensure that the evaluation feedback is most meaningful to the tool users.

    • Thanks for your thoughtful reply @Jared. While I appreciate your careful consideration of each of the issues identified in my test, I just want to clarify that my intent with this post was not to criticize automated tools for being unable to identify particular issues. I was mostly motivated to perform this testing solely to satisfy my curiosity.

      My conclusion is *not* that automated testing is bad and should not be used, or that any of the tools tested is better than the others. It's simply that automated tools have to be part of a comprehensive effort.

      At the University of Washington we've used a commercial automated tool since 2011, and every year when the license comes up for renewal we always choose to renew because it is indeed an important part of our arsenal. However, despite our best efforts to educate our community on proper use of the tools, we still get individuals and departments who believe their websites and web applications are accessible because they "passed the checker". So I feel it's my duty to continually remind folks that it's not that simple.

  4. Thanks for you detail comparison, but I would have to disagree about AInspector being comprehensive. Per my lead developer.....
    • Take for example WCAG Criteria 3.3.4 that is claimed as a unique for the AInspector. On page the tool marks this rule with the “MC” which stands for the “manual check” and provides following for the recommended action: “If the 100 form controls and widgets on this page are used for legal and/or financial transactions, make sure the actions are either reversible or requires the user to confirm the information before the transaction is finalized”. So, the AInspector counted all input elements on the page and recommended to verify manually that all of these elements comply with the WCAG Criteria 3.3.4. It would be a real stretch to call that a “check”.
    • The same is true for the WCAG Criteria 3.3.3 (Error Suggestion). The tool recommends to manually check the compliance with the rule: “If any of the 0 form controls are required, add the REQUIRED attribute or if HTML4 compatibility is required the ARIA-REQUIRED="TRUE" attribute”. Note, that the number of the form controls listed as “0”, even though on the details page all of the textboxes, dropdowns, etc. are listed. In addition, note that the tool’s recommendation is to use required attribute or aria-required attribute even though that is only one of the four acceptable approaches. • The same can be said about criteria 3.3.1, 3.2.4 and many-many others – not even listed in the “Unique Checks” list! The amount of manual checks the AInspector suggest for all possible elements/rules is incredible.

    On the other hand, there is a matter of the validity of the check results.

    • For the Criteria 3.3.2 (Labels or Instructions) AInspector erroneously marks all radio buttons and checkboxes on the pages as missing the labels. On the page it provides user with the following action for this “violation”: “Change the 12 LABEL elements to use the FOR attribute to label their respective form controls”. The analysis of the page shows that all 12 label elements have for attribute already specified. This is one of the main criteria to be checked and given the nature of the application we are bound to have false positives on this criteria all over. WAVE doesn't generate false positives on this one.
    • AInspector marks as a violation of the Criteria 3.3.2 the absence of the label for the UL element (left navigation menu). WAVE does it.
    • AInspector marks all WF1 dropdowns as having low text contrast (2.9 vs. 4.5), while the text in the dropdowns is the same as in the textboxes, etc. WAVE does it.