views:

2057

answers:

2

Quick background: I have a string which contains references to other pages. The pages are linked to using the format: "#12". A hash followed by the ID of the page.

Say I have the following string:

str = 'This string links to the pages #12 and #125'

I already know the IDs of the pages that need linking:

page_ids = str.scan(/#(\d*)/).flatten
=> [12, 125]

How can I loop through the page ids and link the #12 and #125 to their respective pages? The problem I've run into is if I do the following (in rails):

page_ids.each do |id|
  str = str.gsub(/##{id}/, link_to("##{id}", page_path(id))
end

This works fine for #12 but it links the "12" part of #125 to the page with ID of 12.

Any help would be awesome.

+4  A: 

if your indexes always end at word boundaries, you can match that:

page_ids.each do |id|
  str = str.gsub(/##{id}\b/, link_to("##{id}", page_path(id))
end

you only need to add the word boundary symbol \b on the search pattern, it is not necessary for the replacement pattern.

Pinochle
Marvellous. I didn't know about \b. You sir, are a life saver.
JimNeath
+5  A: 

Instead of extracting the ids first and then replacing them, you can simply find and replace them in one go:

str = str.gsub(/#(\d*)/) { link_to("##{$1}", page_path($1)) }

Even if you can't leave out the extraction step because you need the ids somewhere else as well, this should be much faster, since it doesn't have to go through the entire string for each id.

PS: If str isn't referred to from anywhere else, you can use str.gsub! instead of str = str.gsub

sepp2k
This is the right solution.
Magnar
This is efficient, but, depending on the content of the text, could produce false positives. Imagine that he has 125 pages to reference and there are strings like #112325 in the text of the pages (order numbers, etc...) this would produce a link to a dead page in the case of each false positive. While searching using the list of pages and word boundaries is not foolproof, it is more robust than this solution, despite its elegance.
Pinochle
If there was a string like #112325 it would be in the page_ids array, so it would produce a dead link either way. Note that my gsub uses the same regex as the OP's scan. So they will find the exact same ids.
sepp2k
Your regex will find any sequence of digits following a pound symbol. Feeding them in from the array will make sure the sequence of digits following the pound symbol is the one that was wanted. The only ones that JimNeath wants are ones that correspond to his already established index of pages. #112325 or #100000000000000 are not likely to be in that index of pages, but they will be captured by `/#(\d*)/`.
Pinochle
What are you talking about? My regex is the same as Jim's regex for scan. Anything captured by `/#(\d*)/` will be in the page_ids regex because that's the regex used to populate it.
sepp2k
Whoops, I "scanned" over that part of the question. You are right, that regex does work as long as the list is assembled this way and hasn't been subsequently filtered. If he's lucky enough to never have matching subsequences that never represent anything but pages he wants to index, /#(\d*)/ will work well. BTW, I didn't know you could pass a block to gsub like that. Very nice, thanks for the tip.
Pinochle