tags:

views:

158

answers:

5

Hi,

I've got a string like this:

<block trace="true" name="AssignResources: Append Resources">

I need to get the word (or the characters to next whitespace) after < (in this case block) and the words before = (here trace and name).

I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block.

I'm sure it's not that hard, but I've not found the solution yet.

Anybody's got a hint?
Thanks.

Btw: I want to replace the pattern matches with gsub.

EDIT:

Solved it with following regexes:

1) /\s(\w+)="(.*?)"/ matches all attr and their values in $1 and $2.

2) /<!--.*-->/ matches comments

3) /&lt;([\/|!|\?]?)([A-Za-z0-9]+)[^\s|&gt;|\/]*/ matches all tag names, wheter they're in a closing tag, self closing tag, <?xml>-tag or DTD-tag. $1 includes optional prefixed / ! or ? or nothing and $2 contains the tagname

A: 

You can try:

&lt;([^ ]*)\s([^=]*)=
codaddict
A: 
'&lt;block trace="true" name="AssignResources: Append Resources"&gt;'[/&lt;(\w+)/, 1]
#=> "block"

If you pass a regex and an index i to String#[], it'll return the value of the ith capturing group.

Edit:

In 1.9 you can use /(?<=&lt;)\w+/ to require the presence of the &lt; without matching it. In 1.8 there is no way to do that. The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this:

"lo&lt;la li".gsub(/(&lt;)(\w+)/, '\1 --\2--')
 #=> "lo&lt; --la-- li"
sepp2k
Thanks for that hint, but I need the regex pattern as parameter to gsub method, to replace all these pattern matches with another string. I'm thinking about how to make it fit to gsub.
Sebastian
A: 
&lt;block trace="true" name="AssignResources: Append Resources"&gt;

&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;

#result:

$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources

Update: I don't know ruby, but based on the description of gsub here, I believe that something like the following should do the trick.

str = '&lt;block trace="true" name="AssignResources: Append Resources"&gt;'
repl = str.gsub(/&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;/, 
    "tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl
Amarghosh
Thanks Amarghosh, very nice solution, but I forgot to mention, that I need it as pattern parameter for gsub... But thx anyway.
Sebastian
@Sebastian check the update
Amarghosh
Thx Amarghosh I'll try this tomorrow at work...
Sebastian
+1  A: 

Its looks so much like parsing HTML with regex to me

Ruby has very good html parser called Nokogiri

And Here is howto for that

require 'nokogiri'

html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')

html.xpath("//*").each do |s|
    puts s.node_name #block
    puts s.keys #trace, name
    puts s.values #true, AssignResources: Append Resources
end
S.Mark
Hey S.Mark, I already use Nokogiri for that (XML Parsing) and it's great. i will think think about my application flow again - maybe i can do that replacement earlier and with nokogiri.At the time I do that replacement, it's no XML anymore. it's converted to one huge string. that's necessary because it shall be presented as text with having the values of former xml-tag attributes being then html <a>-tags linking to other html pages, defined by the value of the attribute.the replacements via gsub and pattern matching is done to surround parts of a xml tag with different <span>-tags.
Sebastian
And no: doing the syntax highlighting via javascript is no solution in this case. At this moment I've got "prettify" in use. but having documents with more than 2 thousand lines and x times more tags, it's no fun to use.that's why i want to prepare the output already in my parsing app.
Sebastian
syntax highlighting? have you considered using existing library like shjs? http://shjs.sourceforge.net/
S.Mark
yes, I tried it, as I said , using Prettify (http://code.google.com/p/google-code-prettify/).I think the problems are the same: having huge contents to highlight, the site is not usable anymore (30+secs).huge content => 7000+ lines of xmlsometimes weird requirements ask for weird solutions ;)
Sebastian
I think regex can't be fast for 7000+ lines of data though.
S.Mark
That doesn't matter, 'cause it's done before. the app will be executed sometimes at night every two weeks, and if it takes 15 secs (as now), 5 minutes or 5 hours, nobody cares...hmm... now I got an idea. I could do the javascript stuff not after the page is requested in browser (<body onload="highlightIt">) but before, externally.... I should think about that. hmm is it possible to read and write files with javascript :^)
Sebastian
/me blinks. I'm with @S.Mark on this one.
macek
everything works fine now, i'm doing it with regex and it's nice. execution time increased by 50secs - doesn't matter. I agree with you both if it is "real" xml parsing and tag names and attribute values are the center of interest. but at the time I'm parsing it, it's not xml anymore, it's a huge string, and I'm not interested in the name of a tag really, I'm interested in the decision if it represents a tag name and then color it (by surrounding it with some special <span> tags). So do I with quotes "", brackets and special chars.
Sebastian
Btw: Prettify does it with RegExes, too. And I don't think all the other javascript libraries do it different.
Sebastian
A: 

Most probably you should go with Nokigiri or something similar. I couldn't fit it in one gsub but in two:

>> m,r=0,["&lt;blockie ", " tracie=", " namie="]
>> s.gsub(/&lt;.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "&lt;blockie tracie="true" namie="AssignResources: Append Resources"&gt;"
Jonas Elfström