views:

578

answers:

7

I have a list of objects output from ldapsearch as follows:

dn: cn=HPOTTER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=HGRANGER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=RWEASLEY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=DMALFOY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=SSNAPE,ou=FACULTY,ou=HOGWARTS,o=SCHOOL
dn: cn=ADUMBLED,ou=FACULTY,ou=HOGWARTS,o=SCHOOL


So far, I have the following regex:

/\bcn=\w*,/g

Which returns results like this:

cn=HPOTTER,
cn=HGRANGER,
cn=RWEASLEY,
cn=DMALFOY,
cn=SSNAPE,
cn=ADUMBLED,


I need a regex that returns results like this:

HPOTTER
HGRANGER
RWEASLEY
DMALFOY
SSNAPE
ADUMBLED

What do I need to change in my regex so the pattern (the cn= and the comma) is not included in the results?

EDIT: I will be using sed to do the pattern matching, and piping the output to other command line utilities.

A: 

Sounds more like a simple parsing problem and not regex. An ANTLR grammar would sort this out in no time.

duffymo
Wow. A grammar is way overkill for this.
Robert P
+13  A: 

You will have to perform a grouping. This is done by modifying the regex to:

/\bcn=\(\w*\),/g

This will then populate your result into a grouping variable. Depending on your language how to extract this value will differ. (For you with sed the variable will be \1)

Note that most regex flavors you don't have to escape the brackets (), but since you're using sed you will need to as shown above.

For an excellent resource on Regular Expressions I suggest: Mastering Regular Expressions

Gavin Miller
+2  A: 

Check out Expresso I have used it in the past to build my RegEx. It is good to help learning too.

Brawndo
+1, great free tool, helps immensely with debugging these rat's nests called regexs.
jcollum
probably got voted down because the OP is using Linux
jcollum
Or perhaps because it is not a direct answer to the question. Although it is a very useful tool.
EBGreen
+2  A: 

The quick and dirty method is to use submatches assuming your engine supports it:

/\bcn=(\w*),/g

Then you would want to get the first submatch.

EBGreen
+2  A: 

Without knowing what language you're using, we can't tell for sure, but in most regular expression parsers, if you use parenthesis, such as

/\bcn=(\w*),/g

then you'll be able to get the first matching pattern (often \1) as exactly what you are searching for. To be more specific, we need to know what language you are using.

Eddie
+4  A: 

OK, the place where you asked the more specific question was closed as "exact duplicate" of this, so I'm copying my answer from there to here:

If you want to use sed, you can use something like the following:

sed -e 's/dn: cn=\([^,]*\),.*$/\1/'

You have to use [^,]* because in sed, .* is "greedy" meaning it will match everything it can before looking at any following character. That means if you use \(.*\), in your pattern it will match up to the last comma, not up to the first comma.

Eddie
Put your sed command in `backquotes` so that it doesn't change your asterisks into Markdown formatting. :-)
Ben Blank
Thanks for the tip! I learned about that shortly after posting this answer, but never came back to fix this.
Eddie
+2  A: 

If your regex supports Lookaheads and Lookbehinds then you can use

/(?<=\bcn=)\w*(?=,)/g

That will match

HPOTTER
HGRANGER
RWEASLEY
DMALFOY
SSNAPE
ADUMBLED

But not the cn= or the , on either side. The comma and cn= still have to be there for the match, it just isn't included in the result.

Grant
sed does support look behinds, however you'll need to use the Perl-mode switch (-R)
Gavin Miller