ansaurus

Question

Ruby, gsub and regex

Answer 1

+4 A:

if your indexes always end at word boundaries, you can match that:

page_ids.each do |id|
  str = str.gsub(/##{id}\b/, link_to("##{id}", page_path(id))
end

you only need to add the word boundary symbol \b on the search pattern, it is not necessary for the replacement pattern.

Pinochle 2009-08-17 12:41:28

Marvellous. I didn't know about \b. You sir, are a life saver.

JimNeath 2009-08-17 12:48:20

Answer 2

+5 A:

Instead of extracting the ids first and then replacing them, you can simply find and replace them in one go:

str = str.gsub(/#(\d*)/) { link_to("##{$1}", page_path($1)) }

Even if you can't leave out the extraction step because you need the ids somewhere else as well, this should be much faster, since it doesn't have to go through the entire string for each id.

PS: If str isn't referred to from anywhere else, you can use str.gsub! instead of str = str.gsub

sepp2k 2009-08-17 13:07:37

This is the right solution.

Magnar 2009-08-17 13:11:18

This is efficient, but, depending on the content of the text, could produce false positives. Imagine that he has 125 pages to reference and there are strings like #112325 in the text of the pages (order numbers, etc...) this would produce a link to a dead page in the case of each false positive. While searching using the list of pages and word boundaries is not foolproof, it is more robust than this solution, despite its elegance.

Pinochle 2009-08-17 15:12:59

If there was a string like #112325 it would be in the page_ids array, so it would produce a dead link either way. Note that my gsub uses the same regex as the OP's scan. So they will find the exact same ids.

sepp2k 2009-08-17 16:09:15

Your regex will find any sequence of digits following a pound symbol. Feeding them in from the array will make sure the sequence of digits following the pound symbol is the one that was wanted. The only ones that JimNeath wants are ones that correspond to his already established index of pages. #112325 or #100000000000000 are not likely to be in that index of pages, but they will be captured by `/#(\d*)/`.

Pinochle 2009-08-17 16:39:35

What are you talking about? My regex is the same as Jim's regex for scan. Anything captured by `/#(\d*)/` will be in the page_ids regex because that's the regex used to populate it.

sepp2k 2009-08-17 17:13:02

Whoops, I "scanned" over that part of the question. You are right, that regex does work as long as the list is assembled this way and hasn't been subsequently filtered. If he's lucky enough to never have matching subsequences that never represent anything but pages he wants to index, /#(\d*)/ will work well. BTW, I didn't know you could pass a block to gsub like that. Very nice, thanks for the tip.

Pinochle 2009-08-17 18:37:23

ansaurus

tags:

views:

answers:

Ruby, gsub and regex

related questions