tags:

views:

19

answers:

1

I have a string like "{some|words|are|here}" or "{another|set|of|words}"

So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket.

What is the most efficient way to get the selected word of that string ?

I would like do something like this:

@my_string = "{this|is|a|test|case}"
@my_string.get_column(0) # => "this"
@my_string.get_column(2) # => "is"
@my_string.get_column(4) # => "case"

What should the method get_column contain ?

+2  A: 

So this is the solution I like right now:

class String
  def get_column(n)
    self =~ /\A\{(?:\w*\|){#{n}}(\w*)(?:\|\w*)*\}\Z/ && $1
  end
end

We use a regular expression to make sure that the string is of the correct format, while simultaneously grabbing the correct column.

Explanation of regex:

  • \A is the beginnning of the string and \Z is the end, so this regex matches the enitre string.
  • Since curly braces have a special meaning we escape them as \{ and \} to match the curly braces at the beginning and end of the string.
  • next, we want to skip the first n columns - we don't care about them.
    • A previous column is some number of letters followed by a vertical bar, so we use the standard \w to match a word-like character (includes numbers and underscore, but why not) and * to match any number of them. Vertical bar has a special meaning, so we have to escape it as \|. Since we want to group this, we enclose it all inside non-capturing parens (?:\w*\|) (the ?: makes it non-capturing).
    • Now we have n of the previous columns, so we tell the regex to match the column pattern n times using the count regex - just put a number in curly braces after a pattern. We use standard string substition, so we just put in {#{n}} to mean "match the previous pattern exactly n times.
  • the first non skipped column after that is the one we care about, so we put that in capturing parens: (\w*)
  • then we skip the rest of the columns, if any exist: (?:\|\w*)*.

Capturing the column puts it into $1, so we return that value if the regex matched. If not, we return nil, since this String has no nth column.

In general, if you wanted to have more than just words in your columns (like "{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\na newline or two?}"), then just replace all the \w in the regex with [^|{}] so you can have each column contain anything except a curly-brace or a vertical bar.


Here's my previous solution

class String
  def get_column(n)
    raise "not a column string" unless self =~ /\A\{\w*(?:\|\w*)*\}\Z/
    self[1 .. -2].split('|')[n]
  end
end

We use a similar regex to make sure the String contains a set of columns or raise an error. Then we strip the curly braces from the front and back (using self[1 .. -2] to limit to the substring starting at the first character and ending at the next to last), split the columns using the pipe character (using .split('|') to create an array of columns), and then find the n'th column (using standard Array lookup with [n]).

I just figured as long as I was using the regex to verify the string, I might as well use it to capture the column.

rampion