tags:

views:

122

answers:

4

I am doing this in groovy.

Input:

hip_abc_batch   hip_ndnh_4_abc_copy_from_stgig abc_copy_from_stgig
hiv_daiv_batch  hip_a_de_copy_from_staging abc_a_de_copy_from_staging

I want to get the last column. basically anything that starts with abc_.

I tried the following regex (works for second line but not second.

\abc_.*\

but that gives me everything after abc_batch

I am looking for a regex that will fetch me anything that starts with abc_ but I can not use \^abc_.*\ since the whole string does not start with abc_

+3  A: 

Try this:

/\s(abc_.*)$/m

Here is a commented version so you can understand how it works:

\s          # match one whitepace character
(abc_.*)    # capture a string that starts with "abc_" and is followed
            # by any character zero or more times
$           # match the end of the string

Since the regular expression has the "m" switch it will be a multi-line expression. This allows the $ to match the end of each line rather than the end of the entire string itself.

Edit: You don't need to trim the whitespace as the second capture group contains just the text. After a cursory scan of this tutorial I believe this is the way to grab the value of a capture group using Groovy:

matcher = (yourString =~ /\s(abc_.*)$/m)
// this is how you would extract the value from 
// the matcher object
matcher[0][1]
Andrew Hare
thats working! but it is returning me the spaces in front of abc_ also. is there a way to not get leading spaces?
and thanks for the explanation!
You want to use the \1 or $1 value returned, not sure how it works in groovy. The "(...)" section captures a group, and generally $0 is the whole thing and $1 is the first group.
Adam W
@josh - Adam is correct, I captured just the text you needed in the parenthesis. @Adam - thanks! :)
Andrew Hare
thanks guys. perl does not even compare to groovy :) but I was able to fix the leading space by doing following s.replaceAll(/^\s+/,"")
@josh - Please see my edit and I apologize as this is the first Groovy code I have ever written in my life.
Andrew Hare
In groovy 1.6, findAll is idiomatic for using regular expressions, much nicer than the old matcher stuff.theString.findAll(/\b(abc_\w*)\b/)results in a list of matches in theString.
Ted Naleid
+3  A: 

It sounds like you're looking for "words" (i.e., sequences that don't include spaces) that begin with abc_. You might try:

/\babc_.*\b/

The \b means (in some regular expression flavors) "word boundary."

VoteyDisciple
a `.*` can be too greedy. `\S+` might be better
hhaamu
+1. I'd use `\w+` instead of `.*`, but the meat of the answer is the `\b` at the beginning of the regex.
Alan Moore
A: 

Regex buddy(pay) and RegExr(free) can be a big help in learning RegEx if you are interested.

cgreeno
A link to a non-free application is hardly an answer to the question.
Andrew Hare
yeah :( i have to buy that.
A: 

I think you are looking for this: \s(abc_[a-zA-Z_]*)$

If you are using perl and you read all lines into one string, don't forget to set the the "m" option on your regex (that stands for "Treat string as multiple lines").

Oh, and Regex Coach is your free friend.

Adrian Grigore