Can AI Write Your alt Attributes?

In 2016, Facebook unveiled their AI (artificial intelligence) technology, which attempts to automatically describe images to benefit people who are blind or visually impaired. While we are pretty excited about its potential, it’s important to understand where this technology might be helpful, and what its limitations are.
One of the most fundamental tasks when making a website accessible is ensuring that all of your images are also accessible to people who can’t see them. This generally involves adding alt attributes to the image tags. But what if, like Facebook, you allow your users to upload their own photos? How do you ensure that the thousands of images being uploaded each minute are properly described? Until recently, you couldn’t. But this solution is far from perfect.
To illustrate the problems with this technology, let’s look at a few examples of some randomly chosen images, along with what some AI image recognition services think they contain. You’ll notice that most of this output would need to be massaged into an appropriate format for an alt attribute, but that’s a relatively simple matter. The more important question is whether or not the information generated by these services sufficiently describes the images.
Image | Human Crafted Alt Attribute | Facebook Alt Attribute | Amazon Output | Google Output | Microsoft Output |
---|---|---|---|---|---|
![]() |
couple standing with labradoodle and holding 8 puppies | Image may contain: 2 people, people smiling, dog, tree, and outdoor |
|
|
|
![]() |
woman wearing sunglasses and smiling in swimming pool | Image may contain: 1 person, smiling, pool and outdoor |
|
|
|
![]() |
new yorker style cartoon of man being visited in the hospital. He’s completely in traction and has a balloon tied to the foot of his bed. The caption says quote any improvement since I brought the balloon? end quote | Image may contain: drawing |
|
|
|
![]() |
graphite pencil sketch of a person’s face | Image may contain: drawing |
|
|
|
![]() |
blacktop parking spaces with the updated, forward leaning wheelchair accessible icons painted in each space | No automatic alt text available |
|
|
|
![]() |
muscular man lifting barbell in gym | Image may contain: 1 person |
|
|
|
![]() |
Star Wars droid BB-8 | No automatic alt text available |
|
|
|
![]() |
row of parked school buses | Image may contain: outdoor |
|
|
|
As you can see from the information in the previous table, even when AI recognizes what is in an image, the amount of detail provided is far from sufficient in most cases. These algorithms also have difficulty determining what elements in the image are important, and which are simply incidental. As a result, while you can sometimes tell which objects might be in the image, it can be quite difficult to derive what the image is trying to convey.
In short, while this technology does show a great deal of promise, it should, at present, only be used in situations where it is entirely impossible to provide descriptions in any other way. Most alt attributes on the web should continue to be written by humans for the foreseeable future. On the other hand, this technology is better than nothing, though perhaps only just.
For more on what Facebook has been doing in this regard recently, visit https://www.cnet.com/news/facebook-ai-blind-photo-recognition-caption/.