ansaurus

Question

how to use one line regular expression to get matched content

Answer 1

+2 A:

'[ruby] regex'.scan(/\[(.*?)\](.*)/)

will return

[["ruby", " regex"]]

you can read more about String#scan here: http://ruby-doc.org/core/classes/String.html#M000812 (in short it returns an array of all consecutive matches, the outer array in this case is the array of matches, and the inner is the capture groups of the one match).

to do the assignment you can rewrite it like this (assuming you will only ever have one match in the string):

tag, keyword = '[ruby] regex'.scan(/\[(.*?)\](.*)/).flatten

depending on exactly what you want to accomplish you may want to change the regex to

/^\s*\[(.*?)\]\s*(.+)\s*$/

which matches the whole input string, and trims some spaces from the second capture group. Anchoring the pattern to the start and end will make it a bit more efficient, and it will avoid getting false or duplicate matches in some cases (but that very much depends on the input) -- it also guarantees that you can safely use the returned array in assignment, because it will never have more than one match.

As for the follow up question, this is what I would do:

def tags_and_keyword(input)
  input.scan(/^\s*\[(.+)\]\s+(.+)\s*$/) do |match|
    tags = match[0].split(/\]\s*\[/)
    line = match[1]
    return tags, line
  end
end

tags, keyword = tags_and_keyword('[ruby] [regex] [rails] one line')
tags # => ["ruby", "regex", "rails"]
keyword # => "one line"

it can be rewritten in one line, but I wouldn't:

tags, keyword = catch(:match) { input.scan(/^\s*\[(.+)\]\s+(.+)\s*$/) { |match| throw :match, [match[0].split(/\]\s*\[/), match[1]] } }

My solution assumes all tags come before the keyword, and that there's only one tags/keyword expression in each input. The first capture globs all tags, but then I split that string, so it's a two-step process (which, as @Tim wrote in his comment, is required unless you have an engine capable of recursive matching).

Theo 2010-06-22 06:34:51

@Theo, thank for your detailed answer, very helpful for me. Could you see my updated question, that I make it harder?

Freewind 2010-06-22 07:24:20

@Theo, I don't know how to thank you for you answers. I'm sorry I clicked the 'vote up' triangle again, that I canceled my vote by mistake. Now it is locked until you edit your answer again. Please edit your answer, and let me give you the vote. I can learn a lot from you, thank you very much~

Freewind 2010-06-22 12:19:11

I've edited it.

Theo 2010-06-22 13:12:55

Answer 2

+2 A:

You need the Regexp#match method. If you write /\[(.*?)\](.*)/.match('[ruby] regex'), this will return a MatchData object. If we call that object matches, then, among other things:

matches[0] returns the whole matched string.
matches[n] returns the nth capturing group ($n).
matches.to_a returns an array consisting of matches[0] through matches[N].
matches.captures returns an array consisting of just the capturing group (matches[1] through matches[N]).
matches.pre_match returns everything before the matched string.
matches.post_match returns everything after the matched string.

There are more methods, which correspond to other special variables, etc.; you can check MatchData's docs for more. Thus, in this specific case, all you need to write is

tag, keyword = /\[(.*?)\](.*)/.match('[ruby] regex').captures

Edit 1: Alright, for your harder task, you're going to instead want the String#scan method, which @Theo used; however, we're going to use a different regex. The following code should work:

# You could inline the regex, but comments would probably be nice.
tag_and_text = / \[([^\]]*)\] # Match a bracket-delimited tag,
                 \s*          # ignore spaces,
                 ([^\[]*) /x  # and match non-tag search text.
input        = '[ruby] [regex] [rails] one line [foo] [bar] baz'
tags, texts  = input.scan(tag_and_text).transpose

The input.scan(tag_and_text) will return a list of tag–search-text pairs:

[ ["ruby", ""], ["regex", ""], ["rails", "one line "]
, ["foo", ""], ["bar", "baz"] ]

The transpose call flips that, so that you have a pair consisting of a tag list and a search-text list:

[["ruby", "regex", "rails", "foo", "bar"], ["", "", "one line ", "", "baz"]]

You can then do whatever you want with the results. I might suggest, for instance

search_str = texts.join(' ').strip.gsub(/\s+/, ' ')

This will concatenate the search snippets with single spaces, get rid of leading and trailing whitespace, and replace runs of multiple spaces with a single space.

Antal S-Z 2010-06-22 06:41:58

@Antal S-Z, good answer, clean and useful, thank you. I hope you can take a look at my updated question, that I make it a little harder.

Freewind 2010-06-22 07:25:33

@Antal, thank you so much! I never thought we can write such little code to do so much work(I was a Java developer), and you taught me a lot. Thanks again, you are very nice :)

Freewind 2010-06-22 12:24:05

ansaurus

tags:

views:

answers:

how to use one line regular expression to get matched content

related questions