ansaurus

Question

using dictionary to assign misspelled words to its line number

Answer 1

+1 A:

When you insert the new counter into d, you check first word is contained in words. Probably you wanted to check if word is already contained in d:

if word not in d:
    d[word] = [counter]
else:
    d[word].append(counter)

The check if the word is contained in words or line should be a separate if.

You could also simplify this logic with the dicts setdefault() method:

d.setdefault(word, []).append(counter)

Or you make d a defaultdict, which simplifies the assignment even more:

from collections import defaultdict
d = defaultdict(list)
...
d[word].append(counter)

About the general algorithm note that at the moment you first iterate over all lines to increment the counter and then, when the counter has already reached it's maximum value, start checking for misspelled words. Probably you should do the checking for each line in the loop where you increment the counter.

sth 2010-05-23 12:29:30

the text file is actually called soccer.txt but im using sys.argvive only been programming for 2 months so im not guna understand everything .ive changed if word not in words toif words not in dbut i still get an error print(word, d[counter])keyerror: 329

jad 2010-05-23 12:39:01

i have a list of incorrect words and want to print the line number of the incorrect word where it is in my txt file into a set so then it prints out helo 5 8 # 5 and 8 being the line number in the txt file though any suggestions on how to do that plzzzz

jad 2010-05-23 13:23:54

Answer 2

A:

Form what you are doing, I suspect that the following would suit you near perfectly:

from collections import defaultdict

text = ( "cat", "dog", "rat", "bat", "rat", "dog",
         "man", "woman", "child", "child") #

d = defaultdict(list)

for lineno, word in enumerate(text):
    d[word].append(lineno)

print d

This gives you an output of:

defaultdict(<type 'list'>, {'bat': [3], 'woman': [7], 'dog': [1, 5],
                            'cat': [0], 'rat': [2, 4], 'child': [8, 9],
                            'man': [6]})

This simply sets up an empty default dictionary containing a list for each item you access, so that you don't need to worry about creating the entry, and then enumerates it's way over the list of words, so you don't need to keep track of the line number.

As you don't have a list of correct spellings, this doesn't actually check if the words are correctly spelled, just builds a dictionary of all the words in the text file.

To convert the dictionary to a set of words, try:

all_words = set(d.keys())
print all_words

Which produces:

set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])

Or, just to print the words:

for word in d.keys():
    print word

Edit 3:

I think this might be the final version: It's a (deliberately) very crude, but almost complete spell checker.

from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)

At the end, bad_words will be a dictionary with the unrecognised words as the key, and the line numbers where the words were as the matching value entry.

Simon Callan 2010-05-23 12:56:36

i actualy do have a list of correct spelling called dictset = []which is a dictionary of many many words however ill try this thanksthis is what i have a txtfile involving words a list of incorrect words and i just want to attach the lne numbers of the incorrect words to each other

jad 2010-05-23 13:02:59

what u said workd but i want it to print as a set i have a set but i can only print the words as a set

jad 2010-05-23 13:18:00

for inwords in incorrectwords: print(inwords)this prints a set of my incorrect wordsbut how do i do that to the code you showed me ? cheers

jad 2010-05-23 13:19:35

I've just updated my post with a bit more information.

Simon Callan 2010-05-23 13:54:24

im thankful for what uve done and believe if i add one more thing it should work instead of printing the line number of the incorrect wordi want to print the line number of the incorrect word located in the txt file what would i add?i tried to add if word in txtfile:???

jad 2010-05-23 14:14:13

Updated to a minimal, but complete, example

Simon Callan 2010-05-23 18:21:36

ey thanx its working but i have incorrect words as list creates already i want to use that because im using sys.argv[] to open them i have a list of words and a list of incorrect words how can i replace them instead of opening the text? cheers

jad 2010-05-24 01:52:09

ive update my code above i got u into trouble by u not needing to create bad words and goodwords because i have a list of words in the txt file and a list of the misspeled words how can i use them cheers

jad 2010-05-24 02:08:43

You've just about got it - you just need to change the "word not in text" line to use goodwords, instaed of text, as text is not a defined variable.

Simon Callan 2010-05-24 20:18:50

ansaurus

tags:

views:

answers:

using dictionary to assign misspelled words to its line number

related questions