tags:

views:

191

answers:

5

Alright so I am making a commandline based implementation of a website search feature. The website has a list of all the links I need in alphabetical order.

Usage would be something like

./find.py  LinkThatStartsWithB

So it would navigate to the webpage associated with the letter B. My questions is what is the most efficient/smartest way to use the input by the user and navigate to the webpage?

What I was thinking at first was something along the lines of using a list and then getting the first letter of the word and using the numeric identifier to tell where to go in list index.

(A = 1, B = 2...) Example code:

#Use base url as starting point then add extension on end.
Base_URL = "http://www.website.com/"

#Use list index as representation of letter
Alphabetic_Urls = [
       "/extensionA.html",
       "/extensionB.html",
       "/extensionC.html",
       ]

Or would Dictionary be a better bet?

Thanks

+1  A: 

Dictionary! O(1)

Macarse
A: 

The smartest way here will be whatever makes the code simplest to read. When you've only got 26 items in a list, who cares what algorithm it uses to look through it? You'd have to use something really, really stupid to make it have an impact on performance.

If you're really interested in the performance though, you'd need to benchmark different options. Looking at just the complexity doesn't tell the whole story, because it hides the factors involved. For instance, a dictionary lookup will involve computing the hash of the key, looking that up in tables, then checking equality. For short lists, a simple linear search can sometimes be more efficient, depending on how costly the hashing algorithm is.

If your example is really accurate though, can't you just take the first letter of the input string and predict the URL from that? ("/extension" + letter + ".html")

Jon Skeet
Well, that is why I specified efficient/smartest. I was also questioning if using one instead of the other was better a practice. I am always trying to improve my programming skill.
LogicKills
But my point is that efficient and smartest aren't the same thing here. What code is going to be simplest?
Jon Skeet
The URLs sadly aren't arranged in any particular order. Just numbers.
LogicKills
Okay. I go back to my "work out the simplest thing that will possibly work" idea then :)
Jon Skeet
A: 

Dictionary would be a good choice if you have (and will always have) a small number of items. If the list of URL's is going to expand in the future you will probably actually want to sort the URL's by their letter and then match the input against that instead of hard-coding the dictionary for each one.

DShook
Actually, with a small number of items it's not going to matter what you pick, and a dictionary may well be slower than a simple linear search. The advantage of dictionaries is that they scale well to *large* numbers of items.
Jon Skeet
+2  A: 

How are you getting this list of URLS?

If your commandline app is crawling the website for links, and you are only looking for a single item, building a dictionary is pointless. It will take at least as long to build the dict as it would to just check as you go! eg, just search as:

for link in mysite.getallLinks():
    if link[0] == firstletter:
        print link

If you are going to be doing multiple searches (rather than just a single commandline parameter), then it might be worth building a dictionary using something like:

import collections
d=collections.defaultdict(list)
for link in mysite.getallLinks():
    d[link[0]].append(link)             # Dict of first letter -> list of links

# Print all links starting with firstletter
for link in d[firstletter]:
    print link

Though given that there are just 26 buckets, it's not going to make that much of a difference.

Brian
A: 

Since it sounds like you're only talking about 26 total items, you probably don't have to worry too much about efficiency. Anything you come up with should be fast enough.

In general, I recommend trying to use the data structure that is the best approximation of your problem domain. For example, it sounds like you are trying to map letters to URLs. E.g., this is the "A" url and this is the "B" url. In that case, a mapping data structure like a dict sounds appropriate:

html_files = {
    'a': '/extensionA.html',
    'b': '/extensionB.html',
    'c': '/extensionC.html',
}

Although in this exact example you could actually cheat it and skip the data structure altogether -- '/extension%s.html' % letter.upper() :)

Ryan Bright