Microsoft updates Office for accessibility – new templates, ALT tags, introduces machine learning

On , Microsoft announced on their Mechanics blog upcoming changes to the Office suite of programs to improve accessibility of content created using Office. Embedded within the blog is a video demoing the changes. For convenience, I’ve embedded that YouTube video below, and then discuss the impact of these changes later in my post.


Microsoft have been updating their templates for some time, trying to make them work better with modern versions of commercial screen readers such as Jaws and VoiceOver. This announcement takes us further down that road. The templates target a range of Office products including Word, PowerPoint, and interestingly Outlook emails.

I’m particularly interested to see what Microsoft have done for Outlook emails. At CNIB we use Campaign Monitor for our external email campaigns and newsletters, and I had to work really quite hard to ensure that our emails worked across the breadth of email clients out there, both from a usability point of view, and for accessibility. The HTML email problems are caused because there are no real standards yet in place for how much of HTML and CSS is supported on an email client, and the subset of widely supported HTML elements and CSS is frighteningly small. One of the worst offenders was the Outlook 2010/Jaws combo. I also had to play tricks to stop iOS (the operating system on iPhones and iPads) resizing my accessible text when it felt that the text was a tad too long.

Basically HTML email situation is analogous to the Netscape/Microsoft battle of the 1990’s. I wait to see what Microsoft have built into their templates.

ALT tags

One of the most talked about accessibility problems, certainly one of those obsessed over by static accessibility checker vendors, is ALT tags, the fields in HTML used to describe images in text for presentation as a support to screen-reading technology (they can also be used to provide some context sensitive help to sighted users).

Currently in Microsoft Word and PowerPoint, there are two fields for ALT text – a title and a description. In reality, for web pages at least, there is only one field that describes ALT text, and so Microsoft cheat and format the ALT text for you so that the ALT tag says “Title: image title Description: image description”. This reads very well in a screen-reader and forces some structure on the text. Unfortunately it seems to have caused a degree of confusion in the minds of content creators. This is caused in part because images often also have captions, and what is the difference between a caption and an ALT text title? Partly to clear the confusion, Microsoft are dropping “title”.

Machine learning

Arguably the main reason Microsoft are dropping “title” from the ALT text for images is because they want to fill the field automatically. They are applying machine learning to both Word and PowerPoint that scans photographs for content information and then describes them as a text summary. For example the following image would ideally be described as “Bright blue glacial lake surrounded by white ice on a dark mountain”. Chances are that it won’t be. The machine learning algorithm is good, but far from perfect.

Image of a bright blue glacial lake surrounded by white ice on a dark mountain
Bright blue glacial lake surrounded by white ice on a dark mountain

The difficulty in algothmically determining that quality of ALT text is why there are weasel words in the Microsoft announcement: high confidence.

we will offer you automatic suggestions for alt-text when you insert a photographic image that can be recognized with high confidence.

And that “high confidence” caveat is currently necessary. I’ve played with the algorithm with varying results; for sheer amusement value, the results are difficult to beat. The image below is one result. It is a photograph taken in a restaurant. Captionbot is unsure but thinks that it is a cat eating out of a suitcase. This is the algorithm coming to an ALT tag near you.

Image of machine learning describing a restaurant scene
machine learning describing a restaurant scene

My biggest concern about the technology isn’t so much cats eating out of suitcases, image recognition will improve, it is how content creators will behave as a result. They will get accustomed to the automated text and may well stop reading it carefully, if at all. Images convey a lot of information, and ALT text need to succinctly describe what is important about the image. There is a big difference between children playing beside a train” and “children playing on the tracks as a train approaches them”.

Leave a Reply

Your email address will not be published. Required fields are marked *