ansaurus

Question

Ruby Regex matching string before and after certain characters

Answer 1

A:

You can try:

&lt;([^ ]*)\s([^=]*)=

codaddict 2010-02-24 14:43:09

Answer 2

A:

'&lt;block trace="true" name="AssignResources: Append Resources"&gt;'[/&lt;(\w+)/, 1]
#=> "block"

If you pass a regex and an index i to String#[], it'll return the value of the ith capturing group.

Edit:

In 1.9 you can use /(?<=<)\w+/ to require the presence of the < without matching it. In 1.8 there is no way to do that. The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this:

"lo&lt;la li".gsub(/(&lt;)(\w+)/, '\1 --\2--')
 #=> "lo&lt; --la-- li"

sepp2k 2010-02-24 14:43:55

Thanks for that hint, but I need the regex pattern as parameter to gsub method, to replace all these pattern matches with another string. I'm thinking about how to make it fit to gsub.

Sebastian 2010-02-24 14:51:21

Answer 3

A:

&lt;block trace="true" name="AssignResources: Append Resources"&gt;

&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;

#result:

$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources

Update: I don't know ruby, but based on the description of gsub here, I believe that something like the following should do the trick.

str = '&lt;block trace="true" name="AssignResources: Append Resources"&gt;'
repl = str.gsub(/&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;/, 
    "tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl

Amarghosh 2010-02-24 14:49:40

Thanks Amarghosh, very nice solution, but I forgot to mention, that I need it as pattern parameter for gsub... But thx anyway.

Sebastian 2010-02-24 15:05:34

@Sebastian check the update

Amarghosh 2010-02-24 15:32:35

Thx Amarghosh I'll try this tomorrow at work...

Sebastian 2010-02-24 18:07:05

Answer 4

+1 A:

Its looks so much like parsing HTML with regex to me

Ruby has very good html parser called Nokogiri

And Here is howto for that

require 'nokogiri'

html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')

html.xpath("//*").each do |s|
    puts s.node_name #block
    puts s.keys #trace, name
    puts s.values #true, AssignResources: Append Resources
end

S.Mark 2010-02-24 15:19:57

Hey S.Mark, I already use Nokogiri for that (XML Parsing) and it's great. i will think think about my application flow again - maybe i can do that replacement earlier and with nokogiri.At the time I do that replacement, it's no XML anymore. it's converted to one huge string. that's necessary because it shall be presented as text with having the values of former xml-tag attributes being then html <a>-tags linking to other html pages, defined by the value of the attribute.the replacements via gsub and pattern matching is done to surround parts of a xml tag with different <span>-tags.

Sebastian 2010-02-24 15:27:15

And no: doing the syntax highlighting via javascript is no solution in this case. At this moment I've got "prettify" in use. but having documents with more than 2 thousand lines and x times more tags, it's no fun to use.that's why i want to prepare the output already in my parsing app.

Sebastian 2010-02-24 15:34:07

syntax highlighting? have you considered using existing library like shjs? http://shjs.sourceforge.net/

S.Mark 2010-02-24 15:46:08

yes, I tried it, as I said , using Prettify (http://code.google.com/p/google-code-prettify/).I think the problems are the same: having huge contents to highlight, the site is not usable anymore (30+secs).huge content => 7000+ lines of xmlsometimes weird requirements ask for weird solutions ;)

Sebastian 2010-02-24 15:53:31

I think regex can't be fast for 7000+ lines of data though.

S.Mark 2010-02-24 16:19:16

That doesn't matter, 'cause it's done before. the app will be executed sometimes at night every two weeks, and if it takes 15 secs (as now), 5 minutes or 5 hours, nobody cares...hmm... now I got an idea. I could do the javascript stuff not after the page is requested in browser (<body onload="highlightIt">) but before, externally.... I should think about that. hmm is it possible to read and write files with javascript :^)

Sebastian 2010-02-24 18:11:07

/me blinks. I'm with @S.Mark on this one.

macek 2010-02-25 17:22:06

everything works fine now, i'm doing it with regex and it's nice. execution time increased by 50secs - doesn't matter. I agree with you both if it is "real" xml parsing and tag names and attribute values are the center of interest. but at the time I'm parsing it, it's not xml anymore, it's a huge string, and I'm not interested in the name of a tag really, I'm interested in the decision if it represents a tag name and then color it (by surrounding it with some special <span> tags). So do I with quotes "", brackets and special chars.

Sebastian 2010-02-26 14:58:53

Btw: Prettify does it with RegExes, too. And I don't think all the other javascript libraries do it different.

Sebastian 2010-02-26 15:01:20

Answer 5

A:

Most probably you should go with Nokigiri or something similar. I couldn't fit it in one gsub but in two:

>> m,r=0,["&lt;blockie ", " tracie=", " namie="]
>> s.gsub(/&lt;.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "&lt;blockie tracie="true" namie="AssignResources: Append Resources"&gt;"

Jonas Elfström 2010-02-24 15:47:24

ansaurus

tags:

views:

answers:

Ruby Regex matching string before and after certain characters

related questions