PDF remediation is a time-consuming but necessary part of digital accessibility for any organization. It’s tempting to just click Adobe’s “auto-tagging” feature to shortcut the task of making PDFs accessible for people with disabilities. The problem is that auto-tagging often won’t produce an instantly accessible PDF on its own. You will need to manually verify the document is tagged correctly no matter what kind of automation tool you use.
Adobe is Not an Accessibility Tool
Adobe Acrobat wasn’t designed as an accessibility tool. It is a publishing tool, like Microsoft Word or Google Docs (both of which actually have better native accessibility tools built-in). Using the auto-tag feature is not the one-click accessibility solution some may think it is. Auto-tagging a document can be a useful starting point for remediation, but it is not meant to be a one-and-done solution. After using the auto-tag function, the remediator needs to manually verify the accuracy of the tags and resolve the errors that remain. This involves interaction with the complex and intricate tag tree structure in Adobe.
The Adobe auto-tagging features are not “smart”. They are based on static definitions derived from styling in popular publishing tools. So they cannot interpret variations in styling among the many documents that may pass through the program.
Let’s take a look at some of the shortcomings of auto-tagging PDFs in Adobe.
Auto-Tagging a Sample Document
Here is a sample document that contains a few common elements. It contains headings, images, text that is really an image, a list, and a table.
Auto-tagging this document using Adobe produces a number of errors. Additionally, another method of validation is required to ensure accessibility because not all errors will be caught by the Adobe Checker. Auto-tagging errors make the document inaccessible, meaning someone attempting to read it using assistive technology (such as a screen reader or connected Braille display) will not be able to access the information it contains.
Adobe Tag Tree After Auto-Tagging
We can examine the Adobe tag tree after auto-tagging this PDF and see the results.
Text
The first issue lies with the text at the top of the page. The word “Letterhead,” isn’t identified by Adobe as text. It is tagged as part of an image.
Headings
The next issue is that the word Memorandum is tagged as Heading level 3, but should be tagged as a Heading level 1. It clearly matches some style definition that tells the auto-tagger that this size and weight of font is level 3.
Additionally, Adobe has auto-tagged the subsequent headings as Headings level 2, which is an error in logical heading structure, if the H3 has come first.
Text Not OCR’d
Next is the text on the left, below the first paragraph inside the box. Because it is identified as an image, and the text isn’t OCR’d, anyone using assistive technology will be unable to read the information contained in the box. Adobe auto-tagging does not include its OCR capabilities, so there is no way for auto-tagging to resolve this issue. It is also of note that the image is tagged at the end of the reading order, which is not where it belongs. It should be slotted before the “Variations” heading to its right.
List
Now let’s take a look at the list. Despite having bullets, these bullets are not “commonly used” bullets, and Adobe has auto-tagged this list as text using a P-tag. Again, the Adobe auto-tag definition of a list doesn’t allow for “unusual” bullets. An assistive technology user will just be given a string of words with no indication of their relationship, and will not know it is a list.
Table
The Adobe auto-tagging also failed to identify the tabular information at the bottom of the page as a table.
This text is contained in a P-tag (inside the indicated container in the tag tree). The content will be presented to an assistive technology user as an unrelated string of words instead of a table of related information. This particular table apparently doesn’t fit the Adobe definition of a table containing cells.
Auto-Tagging Results in Errors and is a Liability
Even this simple one-page document was not correctly auto-tagged by Adobe. There are an unacceptable number of tagging errors and it is not fully usable by someone accessing it using assistive technology. Presenting this document on your website or via email using Adobe auto-tagging would constitute a violation of the ADA and Section 508. It could result in a complaint or even a lawsuit. More importantly, the end-user would be unable to obtain the information you intend to convey. Accessibility is about providing the same information to EVERYONE.
Fixing Auto-Tagging Errors in Adobe Requires Many Steps
In order to make this document fully accessible, the remediator would have to go back into the Adobe tag tree, item by item, and correct all the errors created by the auto-tagging. The Adobe remediation process is very manual, tedious, and time-consuming. The software is a publishing tool, not a dedicated accessibility solution.
The process of fixing just the list could take as much as half an hour. Here is how it works:
- Using the touch-up reading tool, select each bullet and separate it from each list item, because they are contained inside a single tag.
- Change the new tags containing the bullets into “Lbl” tags.
- Change the tags containing the text into “LBody” tags.
- Then open the tag tree and manually creating “LI” tags.
- Nest the “Lbl” and “LBody” tags inside the “LI” tags, one by one.
- Then put all the “LI” tags into a “List” tag by hand.
A similarly complex process is required to fix the table in this document.
Much time remediating in Adobe is spent simply FINDING the offending tag within the lengthy and complex tag tree in order to make corrections. This document is only one page and the tag tree contains many items, not all of which are in the correct order.
Can automation result in accessible PDFs?
So does this mean automation is never the solution? No, it doesn’t. For Adobe, auto-tagging at least provides a starting point. From there it would more than likely manual corrected. Remediating the table for example would take just a few minutes for an experienced user. You should always validate your work using a screen reader, not just an accessibility checker.
Choose your PDF remediation solution wisely, and select one that saves you the most time and results in the most accessible results.