views:

271

answers:

3

Hello, friends

I'm writing an Applescript playlist generator. Part of the process is to read the iTunes Library XML file to get a list of all of the genres in a user's library. This is the python implementation, which works as I'd like:

    #!/usr/bin/env python

# script to get all of the genres from itunes

import re,sys,sets


## Boosted from the internet to handle HTML entities in Genre names
def unescape(text):
    def fixup(m):
        text = m.group(0)
        if text[:2] == "&#":
            # character reference
            try:
                if text[:3] == "&#x":
                    return unichr(int(text[3:-1], 16))
                else:
                    return unichr(int(text[2:-1]))
            except ValueError:
                pass
        else:
            # named entity
            try:
                text = unichr(htmlentitydefs.name2codepoint[text[1:-1]])
            except KeyError:
                pass
        return text # leave as is
    return re.sub("&#?\w+;", fixup, text)


# probably faster to use a regex than to try to walk
# the entire xml document and aggregate the genres
try:
    xml_path = "/Users/%s/Music/iTunes/iTunes Music Library.xml" % sys.argv[1]
except:
    print '\tUsage: python '+sys.argv[0]+' <your OSX username>'
    raise SystemExit

pattern = "<key>Genre</key><string>([^<]+)</string>" 

try:
    xml = file(xml_path,'r').read()
except:
    print '\tUnable to load your iTunes Library XML file'
    raise SystemExit

matches = re.findall(pattern,xml)
uniques = map(unescape,list(sets.Set(matches)))
## need to write these out somewhere so the applescript can read them
sys.stdout.write('|'.join(uniques))
raise SystemExit

The problem is, I'd like the Applescript to be self-contained and not require that this additional file be present (I plan on making this available to other people). And, as far as I can tell, Applescript doesn't offer any type of regular expression capabilities out of the box. I could loop over each track in the library to get all of the genres, but this is a prohibitively long process that I already do once when building the playlist. So, I'm looking for alternatives.

Since Applescript allows me to run a shell script and capture the results, I imagine that I can accomplish the same behavior using some type of shell command, be it grep, perl, or something else. My *nix command line skills are extremely rusty and I'm looking for some guidance.

So, in short, I'd like to find a way to translate the above python code into something I can call directly from the shell and get a similar result. Thanks!

+3  A: 

Why are you using regex to parse XML? Why not use a proper XML library? Python has some great utilities like ElementTree that make walking the DOM a lot easier, and it yields nice, friendly objects rather than untyped strings.

Here are some ways of parsing XML using Applescript:

Applescript XML Parser (Available since Tiger apparently)

XML Tools you can also use with Applescript

Remember, just like Applescript can hook into iTunes, it can hook into other installed utilities like these.

Lastly, why not just write the whole thing in Python since it has way better development tools for debugging and runs a lot faster. If you're running Leopard, you have Python 2.5.1 pre-installed.

Soviut
+1. regex for parsing XML is the Wrong Thing. There are a thousand things that can and will break. Why does everyone continue to insist on regex as the first resort against text parsing problems it cannot ever cover?
bobince
A: 

Is creating a standalone App the Solution ?

Look at py2app:

py2app, works like py2exe but targets Mac OS

See

Blauohr
A: 

If you're already working in AppleScript, why not just ask iTunes directly?

tell application "iTunes" to get genre of every track of library playlist 1