views:

579

answers:

4

I want to write a Java tool to assess HTML pages of an existing site and if any image has no alt attribute, the tool will insert alt="" to that image. One approach is using an HTML parser (like HtmlCleaner) to generate the DOM then adding the alt attribute to the images in the DOM before writing back the HTML.

However, this approach won't keep the original HTML intact and probably cause some unpredictable side effects, esp. when the existing amount of HTML pages is huge and there is no guarantee about their being well-formed.

Is there any safer way to accomplish this (i.e. should keep the original HTML intact and only add the alt attribute)?

+2  A: 

Short of writing some horrible mess of regexp or other string manipulation code, I don't believe that there is another way of doing this.

I do question why you want to do this? The only reason I can imagine is to pass some sort of automatic validation, but the reason for requiring alt tags is a matter of usability. Adding empty alt tags does not help that in any way. You are just hiding the problem.

Instead I'd suggest writing a bit of Javascript that throws a red border around any image missing an alt tag and making the front end designers add meaningful alt tags to every image thus flagged.

Kris
One of the Section 508 compliant rules requires that decorative images need to have an empty alt tag (to help the reader tool make the distinction between images missing alt and those are decorative)
Buu Nguyen
But your proposal is to identify all the images that are missing alt text and flag them all as decorative. If the content authors are producing images without alt text, and have been trained sufficiently to include alt text on non-decorative images, then training them further to add empty alt attributes (not tags) for decorative images should not be difficult. You shouldn't trust that a missing alt attribute is because the image is decorative rather then human error or ignorance.
David Dorward
Good point; I definitely don't assume missing alt automatically means decorative images though. The thing that leads me to consider this approach is that the site already exists and needs fixing and either human or machine needs to add empty alt attribute to decorative images. Either way, content people have to put in description for all non-decorative images anyway, but at least the tool can help them with the other task.
Buu Nguyen
Ultimately only a human can add a meaningful alt tag (even if by meaningful you mean an empty tag to signify a decorative element). Any programmatic 'fixing' here runs the risk of "hiding" a problem. Aside from failing to validate, missing alt tags do not cause a problem or degrade usability. It is better to leave them out until they can be fixed by a human rather than automatically inject them without any human oversight. Just my $.02. Also, if you care enough to validate, then you must want it to validate because it is actually right, not just technically right!
Kris
I appreciate your opinion. But the context is that the site is there and was built not with compliance in mind, so we have to look through each of those images which either misses or has empty alt attribute anyway. During that review process, we'll add description to the non-decorative images. After that process, we can run the tool which does the rest, i.e. add alt attribute to others (by now, these are decorative images). I think it is no more error-prone than having human does both while yet being more efficient given the amount of pages in the site.
Buu Nguyen
+1  A: 

It's kind of pointless to add empty alt tags to your layout. I second Kris in that it's defeating the purpose of having the alt tags in the first place and I agree with David Dorward's comment.

But, if there is some ulterior motive here, you could do it after the fact in the browser with javascript (or, preferably, jQuery). The client's browser certainly won't be able to change the original HTML and is smart enough to parse through it even if it's not perfectly well-formed.

Using jQuery, place this script in the head section of your page:

<script language="javscript">
$(function() {
    $('img:not([alt])').attr('alt','');
});
</script>

And make sure you include the jQuery library.

KyleFarris
Interesting. However, the alt is supposed to help the text readers in my case; not all readers can really "read" dynamically modified DOM by JavaScript I suppose. Therefore, the solution I'm looking for is statically modify the HTML source to add the alt.
Buu Nguyen
A: 

I've used the Jericho HTML Parser library in the past with success for parsing HTML. It's supposed to work well with poorly formed HTML. This would alter the original HTML though.

Owen
A: 

Hi Owen,

Can you provide an example using jericho for the same. I am not finding element.setAttribute or something of the same in the API. It will be very helpful if you can post an example code.

Jude