views:

817

answers:

4

I'm working on a text editor in ruby, and I need to support a "Find" feature with user-provided regular expression patterns. Here's a simple (familiar) use-case:

Joe User is editing a text file, and has positioned the cursor somewhere in the middle of the file. He wants to search backwards from the current cursor location for the nearest substring matching an arbitrary regular expression.

I'm thinking that this problem amounts to applying the user's pattern to the entire string preceding the cursor-location in the file. Sure, I could loop through all matches from the beginning of the file and use the last match, but this seems painfully inefficient... It would be better to search "right to left," but I haven't found a way to do this with ruby Regexp. Can you help?

+6  A: 

Use the rindex method on your string. Like this:

>> 'ssBssBss'.rindex(/B/)
=> 5
ionut bizau
A: 

looking for .*(foo) should also locate the rightmost foo, thanks to the greedy nature of .* (but rindex may be faster -- needs a microbenchmark to check!).

Alex Martelli
+1  A: 

I think rindex is the way to go. It seems like rindex will actually iterate through the string backwards. Check out line 957 of string.c

It looks like someone figured out a way to reverse regular expressions in PERL back in 2001. So you would reverse a string and reverse a regex...then use the left to right method.

I am sure that is overkill now and you can go with rindex

Tony
+2  A: 

If you are writing a text editor, you will certainly be doing more regex. I strongly recommend www.rubular.com. This is a realtime, web based regex console for Ruby. I guarantee it will save lots of time figuring out Ruby regexes.

Also, in terms of performance, using a regex pattern with rindex will be slow if you have a long file. A quick & dirty optimisation would be to use string.rindex('B') instead of string.rindex(/B/) in other words, lose the regex pattern. This will be faster for longer search strings and longer files.

Remember: regexes are convenient as hell, but expensive computationally. Good luck!

crunchyt