tags:

views:

100

answers:

6

[Edit]

From the feedback/answers I have received, I gather there is some confusion regarding the original question. Consequently, I have reduced the problem to its most rudimentary form

Here are the relevant facts of the problem:

  1. I have a sorted sequence: S
  2. I have an item (denoted by i) that is GUARANTEED to be contained in S
  3. I want a find() algorithm that returns an iterator (iter) that points to i
  4. After obtaining the iterator, I want to be able to iterate FORWARD (BACKWARD?) over the elements in S, starting FROM (and including) i

For my fellow C++ programmers who can also program in Python, what I am asking for, is the equivalent of:

const_iterator std::find (const key_type& x ) const;

The iterator returned can then be used to iterate the sequence. I am just trying to find (pun unintended), if there is a similar inbuilt algorithm in Python, to save me having to reinvent the wheel.

A: 

One simpler way (albeit slower) would be to use filter and filter for keys before/after that date. Filter has to process each element in the list as opposed to slicing not needing to.

+1  A: 

yes , you can do like this:

import itertools
from datetime import datetime

data = {
      "2008-11-10 17:53:59":"data",
      "2005-11-10 17:53:59":"data",
}

list_ = data.keys()
new_list = [datetime.strptime(x, "%Y-%m-%d %H:%M:%S") for x in list_]

begin_date = datetime.strptime("2007-11-10 17:53:59", "%Y-%m-%d %H:%M:%S")

for i in itertools.ifilter(lambda x: x > begin_date, new_list):
    print i
singularity
why not use a generator expression instead of creating a list for `new_list`?
aaronasterling
@aaronasterling: yes, it will work perfectly and it can also save us some memory usage too when dealing with a large dictionary.
singularity
A: 

You can do

def on_or_after(date):
    from itertools import dropwhile
    sorted_items = sorted(date_dictionary.iteritems())
    def before_date(pair):
        return pair[0] < date
    on_or_after_date = dropwhile(before_date, sorted_items)

which I think is about as efficient as it's going to get if you're just doing one such lookup on each sorted collection. on_or_after_date will iterate (date, value) pairs.

Another option would be to build a dictionary as a separate index into the sorted list:

sorted_items = sorted(date_dictionary.iteritems())
date_index = dict((key, i) for i, key in enumerate(sorted_items.keys()))

and then get the items on or after a date with

def on_or_after(date):
    return sorted_items[date_index[date]:]

This second approach will be faster if you're going to be doing a lot of lookups on the same series of sorted dates (which it sounds like you are).

If you want really speedy slicing of the sorted dates, you might see some improvement by storing it in a tuple instead of a list. I could be wrong about that though.

note the above code is untested, let me know if it doesn't work and you can't sort out why.

intuited
A: 

First off, this question isn't related to dicts. You're operating on a sorted list. You're using the results on a dict, but that's not relevant to the question.

You want the bisect module, which implements binary searching. Starting from your code:

import bisect
mydict = {
      "2001-01-01":"data1",
      "2005-01-02":"data2",
      "2002-01-01":"data3",
      "2004-01-02":"data4",
}

# ['2001-01-01', '2002-01-01', '2004-01-02', '2005-01-02']:
sorted_dates = sorted(mydict)

# Iterates over 2002-01-01, 2004-01-02 and 2005-01-02:
offset = bisect.bisect_left(sorted_dates, "2002-01-01")
for item in sorted_dates[offset:]:
    print item
Glenn Maynard
(The original question is scattered and confusing and poorly-stated, so I'm not sure if this is what he's actually asking. I read it a couple more times and I'm still not sure. *shrug*)
Glenn Maynard
+1  A: 

If you know for a fact that the items in your sequence are sorted, you can just use a generator expression:

(item for item in seq if item >= 5)

This returns a generator; it doesn't actually traverse the list until you iterate over it, i.e.:

for item in (item for item in seq if item > 5)
    print item

will only traverse seq once.

Using a generator expression like this is pretty much identical to using itertools.ifilter, which produces a generator that iterates over the list returning only values that meet the filter criterion:

>>> import itertools
>>> seq = [1, 2, 3, 4, 5, 6, 7]
>>> list(itertools.ifilter(lambda x: x>=3, seq))
[3, 4, 5, 6, 7]

I'm not sure why (except for backwards compatibility) we need itertools.ifilter anymore now that we have generator expressions, but other methods in itertools are invaluable.

If, for instance, you don't know that your sequence is sorted, and you still want to return everything in the sequence from a known item and beyond, you can't use a generator expression. Instead, use itertools.dropwhile. This produces a generator that iterates over the list skipping values until it finds one that meets the filter criterion:

>>> seq = [1, 2, 4, 3, 5, 6, 7]
>>> list(itertools.dropwhile(lambda x: x != 3, seq))
[3, 5, 6, 7]

As far as searching backwards goes, this will only work if the sequence you're using is actually a sequence (like a list, i.e. something that has an end and can be navigated backwards) and not just any iterable (e.g. a generator that returns the next prime number). To do this, use the reversed function, e.g.:

(item for item in reversed(seq) if item >= 5)
Robert Rossney
This is the sort of thing I am looking for. Unfortunately, I have no experience using Generators or lambda functions - so I didn't understand most of what you wrote (I'm quite new to Python). I have to admit that most of what I want to do seems to involve generators and/or lambdas - so I guess its time to bite the bullet. I have seen this documentation: http://heather.cs.ucdavis.edu/~matloff/Python/PyIterGen.pdf Is that a good start/introduction or do you have a better resource link?
skyeagle
On generators, David Beazley to the rescue: http://www.dabeaz.com/generators/. As far as lambda functions go, a lambda function's just a function that's declared inline. You could just as easily write a function called `item_in_range(x)` and write `itertools.ifilter(item_in_range, seq)`.
Robert Rossney
A: 

Given your relevant facts:

>>> import bisect
>>> def find_fwd_iter(S, i):
...     j = bisect.bisect_left(S, i)
...     for k in xrange(j, len(S)):
...         yield S[k]
...
>>> def find_bkwd_iter(S, i):
...     j = bisect.bisect_left(S, i)
...     for k in xrange(j, -1, -1):
...         yield S[k]
...
>>> L = [100, 150, 200, 300, 400]
>>> list(find_fwd_iter(L, 200))
[200, 300, 400]
>>> list(find_bkwd_iter(L, 200))
[200, 150, 100]
>>>
John Machin
Nice, simple, short, sweet and above all understandable - even to a neophyte like me. Now that's what I call poetry
skyeagle