views:

31

answers:

2

I'm still kind of new to RegEx in general. I'm trying to retrieve the names from a field so I can split them for further use (using Pentaho Data Integration/Kettle for the data extraction). Here's an example of the string I'm given:

CN=Name One/OU=Site/O=Domain;CN=Name Two/OU=Site/O=Domain;CN=Name Three/OU=Site/O=Domain

I would like to have the following format returned:

Name One;Name Two;Name Three

Kettle uses Java Regular Expressions.

A: 

assuming you have it in file.txt:

sed -e  's/\/OU=Site\/O=Domain//g' -e 's/CN=//g' file.txt
Tomasz Kowalczyk
Tried to load that Regex in and it wasn't able to match.
DBA_Alex
+1  A: 

That sounds like you want substitute&replace based on a regex. How to correctly do that depends on your language. But with sed I would do it like this:

echo "CN=Name One/OU=Site/O=Domain;CN=Name Two/OU=Site/O=Domain;CN=Name Three/OU=Site/O=Domain" |\
sed 's/CN=\([^\/]*\)[^;]*/\1/g'

If you intend to split it later anyway, you probably want to just match the names and return them im a loop. Example code in perl:

#!/usr/bin/perl
$line="CN=Name One/OU=Site/O=Domain;CN=Name Two/OU=Site/O=Domain;CN=Name Three/OU=Site/O=Domain";
for $match ($line =~ /CN=([^\/]*)/g ){
  print "Name: $match\n";
}
Sec
I'll be able to break the data apart in another step with PDI. Just want to do the initial cleaning.
DBA_Alex
I just checked the documentation at http://wiki.pentaho.com/display/EAI/Regex+Evaluation -- it looks like their implementation of regexp substitution stuff is quite limited. The regex you want to get a single name is CN=([^\/]*), but I don't see an option for a "g"lobal flag to get all the names. -- You could try: (CN=([^/]*)[^;]*;)* and then enable Create fields for capture groups.
Sec
That last exp left me with just the text after the last semi-colon. Getting there though =p
DBA_Alex
As I never used PDI, I'm not sure what its doing there. As far as I understood it, it should return the captured parts somewhere. - If it is instead removing what is matched, you could probably do it with two Regex, one (CN=) and one (/[^;]*) -- could you try that?
Sec
That did it! I learned something new today \o/ lolFor future reference, PDI can use RegEx in many different places. I was using it in a Replace in String step (replacing matching characters with blanks). It needed it to be in two parts. Many kudos!
DBA_Alex