tags:

views:

87

answers:

4

I am looking for a regular expression to find all input fields of type hidden in html output. Anyone know an expression to do such?

A: 

See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454

Radomir Dopieralski
+1 x 1000. Don't parse (X)HTML with regex. Full stop. It gets asked here almost every day, and the answer never changes.
spender
He's not talking about parsing all HTML, just a specific case. (http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html) "It's considered good form to demand that regular expressions be considered verboten, totally off limits for processing HTML, but I think that's just as wrongheaded as demanding every trivial HTML processing task be handled by a full-blown parsing engine. It's more important to understand the tools, and their strengths and weaknesses, than it is to knuckle under to knee-jerk dogmatism."
Robert Greiner
This is a specific case of... parsing HTML. And unless you limit yourself to a particular specific case text input which just happens to be HTML, but you don't care about it being HTML, you are going to get subtle and random failures no matter what regular expression you come up with. Some common breakers include <!-- comments with <tags> inside -->, <tags with="funny </tags> inside attributes">, <tags>with<tags/>inside of </tags>, etc.
Radomir Dopieralski
If I dont use a regular expression, what should I use?
Niall Collins
How about a html parser? Preferably a ready made and tested library, so that you have even less work with it than with implementing the regexp solution.
Radomir Dopieralski
A: 

Regular expressions are generally the wrong tool for the job when trying to search or manipulate HTML or XML; a parsing library would likely be a much cleaner and easier solution.

That said, if you're just looking through a big file and accuracy isn't critical, you can probably do reasonably well with something like <input[^>]*type="?hidden"?.

ngroot
ngroot, that expression is only a partial match.
Brad
That's correct. He asked for an expression that would find these tags, which this will usually do. What's it matter if it matches on the whole tag?
ngroot
I don't think finding *half* of the tag will really help, but I see your point. It won't let me un-down-vote you though.
Brad
Half the tag is just fine if you are, as the author requested, just looking to *find* the tags. That's what I usually do if I'm doing ad-hoc searches of documents; I use the shortest expression that will take me to what I'm looking for. If he wants to do something more complex, like replace them, a regex is really not a safe tool to be using anyway.
ngroot
+2  A: 

I agree that the link Radomir suggest is correct that HTML should not be parsed with regular expressions. However, I do not agree that nothing meaningful can be gleaned from their use together. And the ensuing rant is totally counter-productive.

To correct Robert's RegEx:

<([^<]*)type=('|")hidden('|")>[^<]*(/>|</.+?>)

Brad
Not even close. For example, try `<input type =hidden name =surname value =smith>` or `<input type=text name=info value="type='hidden's how to carry data between pages." >`. And both of those examples are valid html. Never mind the problems when processing real world html. Use a parser.
Alohci
@Alohci, *no doubt* you should use a parser if you can for ANYTHING xml. @Niall, if you need the optional spaces in the expression to handle the cases Alohci brought up, it shouldn't be too hard. Ugly, yes, but not too hard.
Brad
+1  A: 

I know you asked for regular expression, but download Html Agility Pack and do the following:

var inputs = htmlDoc.DocumentNode.Descendants("input");
foreach (var input in inputs)
{
   if( input.Attributes["type"].Value == "hidden" )
   // do something
}

You can also use xpath with html agility pack.

Mikael Svenson