The Adobe Acrobat PDF logo surrounded by various line icons.

Auto-Tagging PDFs: The Ugly Truth

PDF remediation is a time-consuming but necessary part of digital accessibility for any organization. It’s tempting to just click  Adobe’s “auto-tagging” feature to shortcut the task of making PDFs accessible for people with disabilities. The problem is that auto-tagging often won’t produce an instantly accessible PDF on its own. You will need to manually verify the document is tagged correctly no matter what kind of automation tool you use.

Adobe is Not an Accessibility Tool

Adobe Acrobat wasn’t designed as an accessibility tool. It is a publishing tool, like Microsoft Word or Google Docs (both of which actually have better native accessibility tools built-in). Using the auto-tag feature is not the one-click accessibility solution some may think it is. Auto-tagging a document can be a useful starting point for remediation, but it is not meant to be a one-and-done solution. After using the auto-tag function, the remediator needs to manually verify the accuracy of the tags and resolve the errors that remain. This involves interaction with the complex and intricate tag tree structure in Adobe. 

The Adobe auto-tagging features are not “smart”. They are based on static definitions derived from styling in popular publishing tools. So they cannot interpret variations in styling among the many documents that may pass through the program.   

Let’s take a look at some of the shortcomings of auto-tagging PDFs in Adobe. 

Auto-Tagging a Sample Document

Here is a sample document that contains a few common elements. It contains headings, images, text that is really an image, a list, and a table.

The sample document with the Adobe tags panel open.

Auto-tagging this document using Adobe produces a number of errors. Additionally, another method of validation is required to ensure accessibility because not all errors will be caught by the Adobe Checker. Auto-tagging errors make the document inaccessible, meaning someone attempting to read it using assistive technology (such as a screen reader or connected Braille display) will not be able to access the information it contains.

The sample document with the accessibility checker tab open.

Adobe Tag Tree After Auto-Tagging

We can examine the Adobe tag tree after auto-tagging this PDF and see the results. 

Text

The first issue lies with the text at the top of the page.  The word “Letterhead,” isn’t identified by Adobe as text.  It is tagged as part of an image.

The sample document with the Adobe tags panel open.

Headings

The next issue is that the word Memorandum is tagged as Heading level 3, but should be tagged as a Heading level 1. It clearly matches some style definition that tells the auto-tagger that this size and weight of font is level 3. 

The sample document with the tags panel open pointing a incorrect heading tag.

Additionally, Adobe has auto-tagged the subsequent headings as Headings level 2, which is an error in logical heading structure, if the H3 has come first.

The sample document with the tags panel open pointing a incorrect heading tag.

Text Not OCR’d

Next is the text on the left, below the first paragraph inside the box. Because it is identified as an image, and the text isn’t OCR’d, anyone using assistive technology will be unable to read the information contained in the box. Adobe auto-tagging does not include its OCR capabilities, so there is no way for auto-tagging to resolve this issue. It is also of note that the image is tagged at the end of the reading order, which is not where it belongs. It should be slotted before the “Variations” heading to its right.

The sample document with an OCR reading order issue.

List

Now let’s take a look at the list. Despite having bullets, these bullets are not “commonly used” bullets, and Adobe has auto-tagged this list as text using a P-tag. Again, the Adobe auto-tag definition of a list doesn’t allow for “unusual” bullets. An assistive technology user will just be given a string of words with no indication of their relationship, and will not know it is a list. 

A sample document with the tags panel open pointing to an issue with the auto-tagged list.

Table

The Adobe auto-tagging also failed to identify the tabular information at the bottom of the page as a table. 

The sample document with the tags panel open pointing to the incorrectly tagged table.

This text is contained in a P-tag (inside the indicated container in the tag tree). The content will be presented to an assistive technology user as an unrelated string of words instead of a table of related information. This particular table apparently doesn’t fit the Adobe definition of a table containing cells. 

Auto-Tagging Results in Errors and is a Liability

Even this simple one-page document was not correctly auto-tagged by Adobe. There are an unacceptable number of tagging errors and it is not fully usable by someone accessing it using assistive technology. Presenting this document on your website or via email using Adobe auto-tagging would constitute a violation of the ADA and Section 508. It could result in a complaint or even a lawsuit. More importantly, the end-user would be unable to obtain the information you intend to convey. Accessibility is about providing the same information to EVERYONE.  

Fixing Auto-Tagging Errors in Adobe Requires Many Steps

In order to make this document fully accessible, the remediator would have to go back into the Adobe tag tree, item by item, and correct all the errors created by the auto-tagging. The Adobe remediation process is very manual, tedious, and time-consuming. The software is a publishing tool, not a dedicated accessibility solution.  

The process of fixing just the list could take as much as half an hour. Here is how it works:

  1. Using the touch-up reading tool, select each bullet and separate it from each list item, because they are contained inside a single tag.

  2. Change the new tags containing the bullets into “Lbl” tags.

  3. Change the tags containing the text into “LBody” tags.

  4. Then open the tag tree and manually creating “LI” tags.

  5. Nest the “Lbl” and “LBody” tags inside the “LI” tags, one by one.

  6. Then put all the “LI” tags into a “List” tag by hand.

A similarly complex process is required to fix the table in this document.

Much time remediating in Adobe is spent simply FINDING the offending tag within the lengthy and complex tag tree in order to make corrections. This document is only one page and the tag tree contains many items, not all of which are in the correct order. 

Can automation result in accessible PDFs?

So does this mean automation is never the solution? No, it doesn’t. For Adobe, auto-tagging at least provides a starting point. From there it would more than likely manual corrected. Remediating the table for example would take just a few minutes for an experienced user. You should always validate your work using a screen reader, not just an accessibility checker. 

Choose your PDF remediation solution wisely, and select one that saves you the most time and results in the most accessible results.

Hands holding a clipboard with a free doc review and quote form.

Get an Accessibility Document Review
and Estimate Today!

Scroll to Top
Skip to content