tags:

views:

86

answers:

4

I'm basically trying to do this (pseudo code, not valid python):

limit = 10
results = [xml_to_dict(artist) for artist in xml.findall('artist') while limit--]

So how could I code this in a concise and efficient way? The XML file can contain anything between 0 and 50 artists, and I can't control how many to get at a time, and AFAIK, there's no XPATH expression to say something like "get me up to 10 nodes".

Thanks!

+2  A: 
limit = 10
limited_artists = [artist in xml.findall('artist')][:limit]
results = [xml_to_dict(artist) for limited_artists]
S.Lott
+1 for synchronicity!
jathanism
+3  A: 

Assuming that xml is an ElementTree object, the findall() method returns a list, so just slice that list:

limit = 10
limited_artists = xml.findall('artist')[:limit]
results = [xml_to_dict(artist) for artist in limited_artists]
jathanism
+1: For being seconds faster.
S.Lott
+1  A: 

This avoids the issues of slicing: it doesn't change the order of operations, and doesn't construct a new list, which can matter for large lists if you're filtering the list comprehension.

def first(it, count):
    it = iter(it)
    for i in xrange(0, count):
        yield next(it)
    raise StopIteration

print [i for i in first(range(1000), 5)]

It also works properly with generator expressions, where slicing will fall over due to memory use:

exp = (i for i in first(xrange(1000000000), 10000000))
for i in exp:
    print i
Glenn Maynard
You don't really **need** to raise `StopIteration`. Simply ending the function will do.
S.Lott
+4  A: 

Are you using lxml? You could use XPath to limit the items in the query level, e.g.

>>> from lxml import etree
>>> from io import StringIO
>>> xml = etree.parse(StringIO('<foo><bar>1</bar><bar>2</bar><bar>4</bar><bar>8</bar></foo>'))
>>> [bar.text for bar in xml.xpath('bar[position()<=3]')]
['1', '2', '4']

You could also use itertools.islice to limit any iterable, e.g.

>>> from itertools import islice
>>> [bar.text for bar in islice(xml.iterfind('bar'), 3)]
['1', '2', '4']
>>> [bar.text for bar in islice(xml.iterfind('bar'), 5)]
['1', '2', '4', '8']
KennyTM
Do you think the XPath solution is faster than the slicing alternatives? I think Elements are lazy, but it might just be faster to not get the elements at all right? Not sure
Infinity
@Infinity: I think `islice` is faster because XPath is more complicated. I haven't verified though. You need to benchmark it yourself.
KennyTM