Hi,
I've got a string like this:
<block trace="true" name="AssignResources: Append Resources">
I need to get the word (or the characters to next whitespace) after <
(in this case block) and the words before =
(here
trace and name).
I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block
.
I'm sure it's not that hard, but I've not found the solution yet.
Anybody's got a hint?
Thanks.
Btw: I want to replace the pattern matches with gsub
.
EDIT:
Solved it with following regexes:
1)
/\s(\w+)="(.*?)"/
matches all attr and their values in $1 and $2.
2)
/<!--.*-->/
matches comments
3)
/<([\/|!|\?]?)([A-Za-z0-9]+)[^\s|>|\/]*/
matches all tag names, wheter they're in a closing tag, self closing tag, <?xml>
-tag or DTD-tag. $1
includes optional prefixed / ! or ?
or nothing and $2
contains the tagname