views:

83

answers:

2

All,

As you know, by python iter we can use iter.next() to get the next item of data. take a list for example:

l =  [x for x in range(100)]
itl = iter(l)
itl.next()            # 0
itl.next()            # 1

Now I want a buffer can store *general iter pointed data * slice in fixed size, use above list iter to demo my question.

class IterPage(iter, size):
      # class code here

itp = IterPage(itl, 5)

what I want is

print itp.first()   # [0,1,2,3,4]
print itp.next()    # [5,6,7,8,9]
print itp.prev()    # [0,1,2,3,4]
len(itp)            # 20   # 100 item / 5 fixed size = 20    
print itp.last()   # [96,97,98,99,100]


for y in itp:           # iter may not support "for" and len(iter) then something alike code also needed here  
    print y
[0,1,2,3,4]
[5,6,7,8,9]
...
[96,97,98,99,100]

it is not a homework, but as a beginner of the python know little about to design an iter class, could someone share me how to code the class "IterPage" here?

Also, by below answers I found if the raw data what I want to slice is very big, for example a 8Giga text file or a 10^100 records table on a database, it may not able to read all of them into a list - I have no so much physical memories. Take the snippet in python document for example:

http://docs.python.org/library/sqlite3.html#

>>> c = conn.cursor()
>>> c.execute('select * from stocks order by price')
>>> for row in c:
...    print row
...
(u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
(u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
(u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)

If here we've got about 10^100 records, In that case, it it possible only store line/records I want by this class with itp = IterPage(c, 5)? if I invoke the itp.next() the itp can just fetch next 5 records from database?

Thanks!

PS: I got an approach in below link: http://code.activestate.com/recipes/577196-windowing-an-iterable-with-itertools/

and I also found someone want to make a itertools.iwindow() function however it is just been rejected. http://mail.python.org/pipermail/python-dev/2006-May/065304.html

+3  A: 

The raw data that I want to slice is very big, for example a 8Giga text file... I may not be able to read all of them into a list - I do not have so much physical memory. In that case, is it possible only get line/records I want by this class?

No, as it stands, the class originally proposed below converts the iterator into a list, which make it 100% useless for your situation.

Just use the grouper idiom (also mentioned below). You'll have to be smart about remembering previous groups. To save on memory, only store those previous groups that you need. For example, if you only need the most recent previous group, you could store that in a single variable, previous_group.

If you need the 5 most recent previous groups, you could use a collections.deque with a maximum size of 5.

Or, you could use the window idiom to get a sliding window of n groups of groups...

Given what you've told us so far, I would not define a class for this, because I don't see many reusable elements to the solution.


Mainly, what you want can be done with the grouper idiom:

In [22]: l =  xrange(100)    
In [23]: itl=iter(l)    
In [24]: import itertools    
In [25]: for y in itertools.izip(*[itl]*5):
   ....:     print(y)
(0, 1, 2, 3, 4)
(5, 6, 7, 8, 9)
(10, 11, 12, 13, 14)
...
(95, 96, 97, 98, 99)

Calling next is no problem:

In [28]: l =  xrange(100)

In [29]: itl=itertools.izip(*[iter(l)]*5)

In [30]: next(itl)
Out[30]: (0, 1, 2, 3, 4)

In [31]: next(itl)
Out[31]: (5, 6, 7, 8, 9)

But making a previous method is a big problem, because iterators don't work this way. Iterators are meant to produce values without remembering past values. If you need all past values, then you need a list, not an iterator:

In [32]: l =  xrange(100)
In [33]: ll=list(itertools.izip(*[iter(l)]*5))

In [34]: ll[0]
Out[34]: (0, 1, 2, 3, 4)

In [35]: ll[1]
Out[35]: (5, 6, 7, 8, 9)

# Get the last group
In [36]: ll[-1]
Out[36]: (95, 96, 97, 98, 99)

Now getting the previous group is just a matter of keeping track of the list index.

unutbu
thanks for your answer, I have to say it is amazing I can using itertools, actually this problem came from using sqlite3.row to get huge data from a table - in my case it may out of memory if I read all of records into a list/tuple using fetchall(). BTW, since an iter also an object, can I put iters into a list [iter1,iter2,iter3...]?
user478514
then I can invoke those iters in a "for" just like those iters can be think as the c pointer to real data?
user478514
Python is very consistent. Objects are objects. Sure, you can put iterators in a list, and use `for x in iterator` as usual. As long as you know what type of object you are dealing with, all its methods, and all Python syntax appropriate for that object is at your disposal. Am I answering your question? If not, please update your original post, with code to make it clearer.
unutbu
yes, your answer is ok and useful for me, I just want to learn a bit more from your answer for this question and I have updated my question. thanks
user478514
#ubuntu, thanks for sharing, you do not need to remove the class you wrote as it still very useful for me or beginner like me to study. By the way I learned now pythoner may get better solution to solve the very big file problem, for example, maybe to write a generator which yield the next record of the data with itertools.izip I learned from you, also THC4k’s class can be combined together in someway to get a common solution, it may just like an unix “more” command, class.next() just like press the PgDn key on “more” command and when I invoke the class.prev(), it like the PgUp be pressed.
user478514
+4  A: 

Since you asked about design, I'll write a bit about what you want - it's not a iterator.

The defining property of a iterator is that it only supports iteration, not random access. But methods like .first and .last do random access, so what you ask for is not a iterator.

There are of course containers that allow this. They are called sequences and the simplest of them is the list. It's .first method is written as [0] and it's .last is [-1].

So here is such a object that slices a given sequence. It stores a list of slice objects, which is what Python uses to slice out parts of a list. The methods that a class must implement to be a sequence are given by the abstact base class Sequence. It's nice to inherit from it because it throws errors if you forget to implement a required method.

from collections import Sequence

class SlicedList(Sequence):
    def __init__(self, iterable, size):
        self.seq = list(iterable)
        self.slices = [slice(i,i+size) for i in range(0,len(self.seq), size)]

    def __contains__(self, item):
        # checks if a item is in this sequence
        return item in self.seq

    def __iter__(self):
        """ iterates over all slices """
        return (self.seq[slice] for slice in self.slices)

    def __len__(self):
        """ implements len( .. ) """
        return len(self.slices)

    def __getitem__(self, n):
        # two forms of getitem ..
        if isinstance(n, slice):
            # implements sliced[a:b]
            return [self.seq[x] for x in self.slices[n]]
        else:
            # implements sliced[a]
            return self.seq[self.slices[n]]

s = SlicedList(range(100), 5)

# length
print len(s) # 20

#iteration
print list(s) # [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], ... , [95, 96, 97, 98, 99]]
# explicit iteration:
it = iter(s)
print next(it) # [0, 1, 2, 3, 4]

# we can slice it too
print s[0], s[-1] # [0, 1, 2, 3, 4] [95, 96, 97, 98, 99]
# get the first two
print s[0:2] # [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
# every other item
print s[::2] # [[0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], ... ]

Now if you really want methods like .start (what for anyways, just a verbose way for [0] ) you can write a class like this:

class Navigator(object):    
    def __init__(self, seq):
        self.c = 0
        self.seq = seq

    def next(self):
        self.c +=1
        return self.seq[self.c]

    def prev(self):
        self.c -=1
        return self.seq[self.c]

    def start(self):
        self.c = 0
        return self.seq[self.c]

    def end(self):
        self.c = len(self.seq)-1
        return self.seq[self.c]

n = Navigator(SlicedList(range(100), 5))

print n.start(), n.next(), n.prev(), n.end()
THC4k
wow, thanks THC4k, I think now your answer is much more better than what I want. And only one concern is "s = SlicedList(range(100), 5)", here raw data range(100) has no problem, but if it is a 8Giga text file or a 10^100 records in a database, I have no chance to read all of them into memory but using an iter, may I know how to slice them with iter then using start, next as in the navigator? using python sqlite3 as example: cursor.execute('select * from all_stock_history') this may have too much records, to use [row for row in cursor] will cause out of memory.....
user478514
@user478514: You're completely missing the point of a database! The very reason it was invented was that you could query large datasets without loading them all ... use a query like "SELECT * FROM all_stock LIMIT 100 OFFSET 100" - don't try to solve this in python!
THC4k
@THC4k, thanks for sharing, the reason I will not use the "SELECT * FROM all_stock LIMIT 100 OFFSET 100" but try to solve it in python is the raw data may presents in different format, plaintext, xml, csv, tables in database, python's generator ...... my idea here is one entry of raw data can be coded and retrieved via iter.next() no matter what their raw format is and is finity or not, right? that is why I want an iter or sequence like class. This class can be used as an common buffer with size limit and navigating function for the raw data.
user478514
@user478514: Again, if you want a iterator you can only support next() and you cannot have .prev, .first, .last! But if you just want a iterator, then this whole problem is solved already: The libraries for sqllite and for text files and many other data sources are iterators. And if they are sequences, you always get a iterators from them by doing `iter( seq )`. But that does not mean that the underlaying source magically turns into a iterator ...
THC4k