tags:

views:

226

answers:

2

I am trying to find a way to let me dynamically create a regexp object from a string (taken from the database) and then use that to filter another string. This example is to extract data from a git commit message, but in theory any valid regexp could be present in the database as a string.

What happens

>> string = "[ALERT] Project: Revision ...123456 committed by Me <[email protected]>\n on 2009-   07-28 21:21:47\n\n    Fixed typo\n"
>> r = Regexp.new("[A-Za-z]+: Revision ...[\w]+ committed by [A-Za-z\s]+")
>> string[r]
=> nil

What I want to happen

>> string = "[ALERT] Project: Revision ...123456 committed by Me <[email protected]>\n on 2009-   07-28 21:21:47\n\n    Fixed typo\n"
>> string[/[A-Za-z]+: Revision ...[\w]+ committed by [A-Za-z\s]+/]
=> "Project: Revision 123456 committed by Me"
+6  A: 

You're only missing one thing:

>> Regexp.new "\w"
=> /w/
>> Regexp.new "\\w"
=> /\w/

Backslashes are escape characters in strings. If you want a literal backslash you have to double it.

>> string = "[ALERT] Project: Revision ...123456 committed by Me <[email protected]>\n on 2009-   07-28 21:21:47\n\n    Fixed typo\n"
=> "[ALERT] Project: Revision ...123456 committed by Me <[email protected]>\n on 2009-   07-28 21:21:47\n\n    Fixed typo\n"
>> r = Regexp.new("[A-Za-z]+: Revision ...[\\w]+ committed by [A-Za-z\\s]+")
=> /[A-Za-z]+: Revision ...[\w]+ committed by [A-Za-z\s]+/
>> string[r]
=> "Project: Revision ...123456 committed by Me "

Typically, if you'd pasted the output from your "broken" lines, rather than just the input, you'd probably have spotted that the w and s weren't escaped properly

Gareth
Perfect, thanks - I knew I had to be doing something subtly wrong.
davidsmalley
A: 

Option 1:

# Escape the slashes:
r = Regexp.new("[A-Za-z]+: Revision ...[\\w]+ committed by [A-Za-z\\s]+")

Disadvantage: manually escape all known escape characters

Option 2:

# Use slashes in constructor
r = Regexp.new(/[A-Za-z]+: Revision ...[\w]+ committed by [A-Za-z\s]+/)

Disadvantage: None

Swanand
For option 2 - the argument to the constructor is always string because the regex is being pulled from the database so that won't work in this scenario.
davidsmalley