tags:

views:

71

answers:

3

Hi, I'm new to this site, and new to Python.

So I'm learning about Regular Expressions and I was working through Google's expamples here.

I was doing one of the 'Search' examples but I changed the 'Search' to 'Split' and changed the search pattern a bit just to play with it, here's the line

print re.split(r'i', 'piiig')

(notice that there are 3 'i's in the text 'piiig')

The output only has 2 spaces where it's been split.

['p', '', '', 'gs']

Just wondering why this gives that output. This isn't a real life problem and has no relevance but I'm thinking I could run into this later on and want to know what's going on.

Anybody know what's going on???

+2  A: 

split removes the instance it finds. The two blank strings are are the two empty strings between the is.

If you joined the array back together using i as a separator, you'd get the original string back.

piiig, in that respect is p- i - i - i -g (here I'm using a dash for the empty string)

Kobi
+3  A: 

Your example might make more sense if you replace i with ,:

print re.split(r',', 'p,,,g')

In this case, there are four fields found by splitting on the comma, a 'p', a 'g', and two empty ones '' in the middle.

Greg Hewgill
Thanks man, now that I look at it like that it makes it really trivial and totally makes sense
KyleGraves
A: 

Think of it this way ... (in Java as I am not so good in python)

String       Text     = "piiig";
List<String> Spliteds = new ArrayList<String>();
String       Match    = "";
int  I;
char c;
for (I = 0; I < Text.length; I++) {
    c = Text.charAt(I);
    if (c == 'i') {
        Spliteds.add(Match);
        Match = "";
    } else {
        Match += c;
    }
}
if (Match.length != 0)
    Spliteds.add(Match);

So when you run ...

 At the end of each loop:
When: (I == 0) => c = 'p'; Match = "p"; Spliteds = {};
When: (I == 1) => c = 'i'; Match =  ""; Spliteds = {"p"};
When: (I == 2) => c = 'i'; Match =  ""; Spliteds = {"p", ""};
When: (I == 3) => c = 'i'; Match =  ""; Spliteds = {"p", "", ""};
When: (I == 4) => c = 'g'; Match = "g"; Spliteds = {"p", "", ""};
At the end of the program:
      (I == 4) => c = 'g'; Match = "g"; Spliteds = {"p", "", "", "g"};

The RegEx engine just simple find string between each 'i' and this include empty string between 'i' right after another 'i'.

Hope this helps.

NawaMan