tags:

views:

158

answers:

4

I'd like to use grep to find out if/where an html class is used across a bunch of files. The regex pattern should find not only <p class="foo"> but also <p class="foo bar foo-bar">.

So far I'm able to find class="foo" with this example below, can't make it work with multiple classnames:

grep -Ern "class=\"result+(\"| )" *

Any suggestions? Thanks! Mike

A: 

Depends what metacharacters your grep supprts, try:

'class=\"([a-z]+ ?)+\"'

Paul Creasey
+7  A: 

How about something like this:

grep -Erno 'class[ \t]*=[ \t]*"[^"]+"' *

That will also allow for more whitespace and should give you output similar to:

1:class="foo bar baz"
3:class = "haha"

To see all classes used, you can pipe output from the above into the following:

cut -f2 -d'"' | xargs | sort | uniq
Kaleb Pederson
The -o flag is nice. I didn't know about it--sure beats the perl command I usually use to print the match string.
Ken Fox
Thanks Kaleb! Still wrapping my head around regex... Really like the use of the star for "zero or more" spaces or tabs... then I don't need to use those conditionals. Very helpful.
Mike
+1  A: 

Regular expressions are a pretty poor tool for parsing HTML. Try looking into simpleXML ( http://php.net/manual/en/book.simplexml.php ). Roll-your-own regEx on HTML is begging for trouble.

Erik
See http://www.codinghorror.com/blog/archives/001311.html
Wim
Find a parser e.g. here: http://stackoverflow.com/questions/773340/can-you-provide-an-example-of-parsing-html-with-your-favorite-parser
Svante
This is not parsing HTML, this is pattern matching, what regular expressions where made for.
Paul Creasey
Can you post a command line example to do something similar to Kaleb's grep? What you say is the conventional wisdom, but it seems a bit over-complicated for this problem.
Ken Fox
-1 because Mike is looking for a solution using grep and not php and this doesn't really address the question.
Dave Paroulek
+1  A: 

Don't do it. It will drive you insane: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Instead, use a HTML parser. It's not hard.

EDIT: Here's an example in PowerShell

Get-ChildItem -Recurse *.html | where { 
    ([xml](Get-Content $_)).SelectNodes( '//*' ) | where { $_.GetAttribute( "class" ).Contains( "foo" ) } 
}
Jay Bazuzi
From the command line? I haven't found any yet. Care to develop one for the OP?
slebetman
@slebetman: done.
Jay Bazuzi