views:

2505

answers:

18

How would a go about making a program where the user enters a string, and the program generates a list of words beginning with that string?

Ex:
User: "abd"
Program:abdicate, abdomen, abduct...

Thanks!


Edit: I'm using python, but I assume that this is a fairly language-independent problem.

+2  A: 
var words = from word in dictionary
            where word.key.StartsWith("bla-bla-bla");
            select word;
dimarzionist
Doesn't look like Python. Somehow it reminds me С# LINQ 8-)
aku
I think the author means [w for w in dictionary if w.startswith(x)] :)
Torsten Marek
+1  A: 

Try using regex to search through your list of words, e.g. /^word/ and report all matches.

Chris Bunch
A: 

I thought of that, but I'm afraid it would be inefficient through an entire dictionary file. Wouldn't it be?

stalepretzel
It is going to have to go through every word in the dictionary until you are using a db or something that an index can be built on. Are you using a particular language?
Arthur Thomas
This I think is a case of premature optimisation. A dictionary file for english contains 638311 entries so even fairly naive methods will chew through it. I suggest you go with the _simplest_ approach to Get It Working Now, then decide if you really need to optimise it.
freespace
+1  A: 
def main(script, name):
 for word in open("/usr/share/dict/words"):
  if word.startswith(name):
   print word,

if __name__ == "__main__":
 import sys
 main(*sys.argv)
Aaron Maenpaa
There is surely an optimisation to be had here on grounds that the dictionary is sorted. This is not a criticism of the code provided, since it may well be fast enough already. Just thought it worth mentioning.
Steve Jessop
+8  A: 

One of the best ways to do this is to use a directed graph to store your dictionary. It takes a little bit of setting up, but once done it is fairly easy to then do the type of searches you are talking about.

The nodes in the graph correspond to a letter in your word, so each node will have one incoming link and up to 26 (in English) outgoing links.

You could also use a hybrid approach where you maintain a sorted list containing your dictionary and use the directed graph as an index into your dictionary. Then you just look up your prefix in your directed graph and then go to that point in your dictionary and spit out all words matching your search criteria.

Daniel
+9  A: 

Use a trie.

Add your list of words to a trie. Each path from the root to a leaf is a valid word. A path from a root to an intermediate node represents a prefix, and the children of the intermediate node are valid completions for the prefix.

erickson
+2  A: 

If you really want to be efficient - use suffix trees or suffix arrays - wikipedia article.

Your problem is what suffix trees were designed to handle. There is even implementation for Python - here

Jakub Kotrla
How would suffix trees help here? Aren't they designed to efficiently search for arbitrary substrings, not only prefixes?
Rafał Dowgird
A substring tree quickly tells you whether a string contains a substring. Each tree holds one string. It's not a collection of strings. It would be an efficient way to determine whether "dom" is a substring of "abdomen", but I don't see how to adapt it to the OP's question.
erickson
+1  A: 

If you need to be really fast, use a tree:

build an array and split the words in 26 sets based on the first letter, then split each item in 26 based on the second letter, then again.

So if your user types "abd" you would look for Array[0][1][3] and get a list of all the words starting like that. At that point your list should be small enough to pass over to the client and use javascript to filter.

Sklivvz
Ummm.... Isn't that the Trie solution?
S.Lott
Yeah, but my answer is precedent.
Sklivvz
+6  A: 

If you on a debian[-like] machine,

#!/bin/bash
echo -n "Enter a word: "
read input
grep "^$input" /usr/share/dict/words

Takes all of 0.040s on my P200.

freespace
He uses Python...
Jakub Kotrla
Yes, though I wrote my answer _before_ he edited it. None the less, he can do the same in python :P
freespace
Try this: look "$input"
Chris Jester-Young
That is neat, and it would work here because the dictionary is already sorted - thanks!
freespace
+4  A: 
egrep `read input && echo ^$input` /usr/share/dict/words

oh I didn't see the Python edit, here is the same thing in python

my_input = raw_input("Enter beginning of word: ")
my_words = open("/usr/share/dict/words").readlines()
my_found_words = [x for x in my_words if x[0:len(my_input)] == my_input]
+4  A: 

If you really want speed, use a trie/automaton. However, something that will be faster than simply scanning the whole list, given that the list of words is sorted:

from itertools import takewhile, islice
import bisect

def prefixes(words, pfx):
    return list(
             takewhile(lambda x: x.startswith(pfx), 
                       islice(words, 
                              bisect.bisect_right(words, pfx), 
                              len(words)))

Note that an automaton is O(1) with regard to the size of your dictionary, while this algorithm is O(log(m)) and then O(n) with regard to the number of strings that actually start with the prefix, while the full scan is O(m), with n << m.

Torsten Marek
A: 

If your dictionary is really big, i'd suggest indexing with a python text index (PyLucene - note that i've never used the python extension for lucene) The search would be efficient and you could even return a search 'score'.

Also, if your dictionary is relatively static you won't even have the overhead of re-indexing very often.

A: 

Don't use a bazooka to kill a fly. Use something simple just like SQLite. There are all the tools you need for every modern languages and you can just do :

"SELECT word FROM dict WHERE word LIKE "user_entry%"

It's lightning fast and a baby could do it. What's more it's portable, persistent and so easy to maintain.

Python tuto :

http://www.initd.org/pub/software/pysqlite/doc/usage-guide.html

e-satis
A: 

if i input a letter then, how can i print all the words starts with that letter from a file in Python?

A: 

if it is a small file may be possible I am not sure

A: 

A linear scan is slow, but a prefix tree is probably overkill. Keeping the words sorted and using a binary search is a fast and simple compromise.

import bisect
words = sorted(map(str.strip, open('/usr/share/dict/words')))
def lookup(prefix):
    return words[bisect.bisect_left(words, prefix):bisect.bisect_right(words, prefix+'~')]

>>> lookup('abdicat')
['abdicate', 'abdication', 'abdicative', 'abdicator']
Coady
A: 

Most Pythonic solution

# set your list of words, whatever the source
words_list = ('cat', 'dog', 'banana')
# get the word from the user inpuit
user_word = raw_input("Enter a word:\n")
# create an generator, so your output is flexible and store almost nothing in memory
word_generator = (word for word in words_list if word.startswith(user_word))

# now you in, you can make anything you want with it 
# here we just list it :

for word in word_generator :
    print word

Remember generators can be only used once, so turn it to a list (using list(word_generator)) or use the itertools.tee function if you expect using it more than once.

Best way to do it :

Store it into a database and use SQL to look for the word you need. If there is a lot of words in your dictionary, it will be much faster and efficient.

Python got thousand of DB API to help you do the job ;-)

e-satis
A: 

sorry........ this is a dictionary " Nepali -Nepali Meaning -English -Myanmar" I want to attach a file but I don't know any ID of yours. Can you give me a email ID please?????????????

vinod prasad neupane