views:

145

answers:

3

Hello guys. I'm not sure if i understand properly how os.walk store its results.

Im trying to do the following:

I'm checking a root folder for subsequent folders. There are several hundreds of em, and they are nested in somewaht uniform way.

I'm trying to check each subfolder, and if it ends with a four digit number, store it in a list.

I used a highly procedural code, and got to it, but the code is using os.listdir, meaning that i need to execute the function for each folder i want.

Is there a better way?

def ListadorPastas(pasta):

    resultado = []

    regex = "^[0-9]{4}"
    padrao = re.compile(regex)

    for p in os.listdir(pasta):
        regexObject = re.match(padrao,p[-4:])
        if (regexObject!=None):
            resultado.append(regexObject.string)
        else:
            pass
    return resultado

Also, i have a regex problem: this regex is matching the last four sliced digits of a expression. Sometime i have folders with 5 digits in the end, which ALSO will match. I tried using "$[0-9]{4}" but it returns me nothing. Any ideas why?

Thanks in advanced.

George

A: 

regex you should be using is:

pattern = re.compile(r'(?<!\d)\d{4}$')
re.search(pattern, p)

as for os.walk your explanation is not entirely clear.

SilentGhost
A: 

About the regex: If you use p[-4:], you'll always look at the last four characters of p, so you don't get a chance to see if there really are five.

So instead, use

regex = "(?<![0-9])[0-9]{4}$"
padrao = re.compile(regex)

regexObject = re.search(padrao, p)

re.search will also match parts of the string.

Tim Pietzcker
Your regex will fail for a string that's exactly 4 digits (nothing before them): you need to accept a nondigit OR start-of-string before the 4 digits-then-end-of-string (see my answer for details on this and more).
Alex Martelli
Yeah, I noticed that too right after I posted it - just corrected it because it also returns a different-length match object. Now, with lookbehind, it should behave like the poster's solution (but meet the requirement "only four digits")
Tim Pietzcker
+3  A: 

using "$[0-9]{4}" but it returns me nothing. Any ideas why?

$ means end-of-(line or string) in a regex pattern, so I wonder how you expected "end of string then four digits" to ever possibly match anything...? By definition of "end" it won't be followed by 4 digits! r'(^|\D)\d{4}$' should work better if I understand what you want, to match strings that are just 4 digits, or end with exactly 4 digits, not 5 or more (\D means non-digit, just like \d means digit -- no reason to use [0-9] or [^0-9]!).

os.walk does not need to store much -- a couple pointers on the implicit tree it's walking -- but why do you care how it's implemented internally? Just use it...:

def ListadorPastas(pasta):
    resultado = []
    for root, dirs, files in os.walk(pasta):
        for d in dirs:
          if (len(d)==4 or len(d)>4 and not d[-5].isdigit()
             ) and d[-4:].isdigit():
              resultado.append(d)
    return resultado

where I'm also taking the opportunity to show a non-regex way to do the checks you want on the subdirectory's name.

Alex Martelli
i will try it your way!thanks alex!
George