tags:

views:

104

answers:

4

I am parsing some HTML source. Is there a regex script to find out whether alt tags in a html document are empty?

I want to see if the alt tags are empty or not.

Is regex suitable for this or should I use string manipulation in C#?

+2  A: 

You have to parse the HTML and check tags, use the following link, it includes a C# library for parsing HTML tags, and you can loop through tags and get the number of tags: Parsing HTML tags.

Ahmy
A: 

If this is valid XHTML, why do you need Regex at all? If you simply search for the string:

alt=""

... you should be able to find all empty alt tags.

In any case, it shouldn't be too complicated to construct a Regex for the search too, taking into account poorly written HTML markup (especially with spaces):

alt\s*=\s*"\s*"
Cerebrus
What if 'alt=""' is part of normal text?
Tomalak
and double quote could be single quotes, there could even be no quotes if the HTML is really sloppy.
Sam Hasler
Yep, I'm well aware of these possibilities. The OP hasn't provided enough info for us to make an intelligent assumption.
Cerebrus
Apologies. Yeah, that wouldn't be possible because of the above issues. However, it is possible to look for that tag as double and single quotes and also check for spaces beside the a (in alt) and " (closing quotation mark). Reading character-by-character would make this pretty easy.I am now using WatiN to find alt tags, as it has the relevant methods. Parsing is an interesting topic, though.
dotnetdev
A: 

If you want to do it just looking at the page then CSS selectors might be better, assuming your browser supports the :not selector.

Install the selectorgadget bookmarklet. Activate it on your page and then put the following selector in the intput box and press enter.

img:not([alt])

If you are automating it, and have access to the DOM for the HTML you could use the same selector.

Sam Hasler
A: 

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Chas. Owens