views:

116

answers:

2

This is the code I have now

from collections import defaultdict
goodwords = set()
with open("soccer.txt", "rt") as f:
     for word in f.readlines():
        goodwords.add(word.strip())
badwords = defaultdict(list)
with open("soccer.txt", "rt") as f:
    for line_no, line in enumerate(f):
        for word in line.split():
            if word not in text:
                badwords[word].append(line_no)
print(badwords)

text is my text file

words is my misspelled words however i want this to print my incorrect word list which is words[] and the linenumber ???????

togeher 5 7

+1  A: 

When you insert the new counter into d, you check first word is contained in words. Probably you wanted to check if word is already contained in d:

if word not in d:
    d[word] = [counter]
else:
    d[word].append(counter)

The check if the word is contained in words or line should be a separate if.

You could also simplify this logic with the dicts setdefault() method:

d.setdefault(word, []).append(counter)

Or you make d a defaultdict, which simplifies the assignment even more:

from collections import defaultdict
d = defaultdict(list)
...
d[word].append(counter)

About the general algorithm note that at the moment you first iterate over all lines to increment the counter and then, when the counter has already reached it's maximum value, start checking for misspelled words. Probably you should do the checking for each line in the loop where you increment the counter.

sth
the text file is actually called soccer.txt but im using sys.argvive only been programming for 2 months so im not guna understand everything .ive changed if word not in words toif words not in dbut i still get an error print(word, d[counter])keyerror: 329
jad
i have a list of incorrect words and want to print the line number of the incorrect word where it is in my txt file into a set so then it prints out helo 5 8 # 5 and 8 being the line number in the txt file though any suggestions on how to do that plzzzz
jad
A: 

Form what you are doing, I suspect that the following would suit you near perfectly:

from collections import defaultdict

text = ( "cat", "dog", "rat", "bat", "rat", "dog",
         "man", "woman", "child", "child") #

d = defaultdict(list)

for lineno, word in enumerate(text):
    d[word].append(lineno)

print d

This gives you an output of:

defaultdict(<type 'list'>, {'bat': [3], 'woman': [7], 'dog': [1, 5],
                            'cat': [0], 'rat': [2, 4], 'child': [8, 9],
                            'man': [6]})

This simply sets up an empty default dictionary containing a list for each item you access, so that you don't need to worry about creating the entry, and then enumerates it's way over the list of words, so you don't need to keep track of the line number.

As you don't have a list of correct spellings, this doesn't actually check if the words are correctly spelled, just builds a dictionary of all the words in the text file.

To convert the dictionary to a set of words, try:

all_words = set(d.keys())
print all_words

Which produces:

set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])

Or, just to print the words:

for word in d.keys():
    print word

Edit 3:

I think this might be the final version: It's a (deliberately) very crude, but almost complete spell checker.

from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)

At the end, bad_words will be a dictionary with the unrecognised words as the key, and the line numbers where the words were as the matching value entry.

Simon Callan
i actualy do have a list of correct spelling called dictset = []which is a dictionary of many many words however ill try this thanksthis is what i have a txtfile involving words a list of incorrect words and i just want to attach the lne numbers of the incorrect words to each other
jad
what u said workd but i want it to print as a set i have a set but i can only print the words as a set
jad
for inwords in incorrectwords: print(inwords)this prints a set of my incorrect wordsbut how do i do that to the code you showed me ? cheers
jad
I've just updated my post with a bit more information.
Simon Callan
im thankful for what uve done and believe if i add one more thing it should work instead of printing the line number of the incorrect wordi want to print the line number of the incorrect word located in the txt file what would i add?i tried to add if word in txtfile:???
jad
Updated to a minimal, but complete, example
Simon Callan
ey thanx its working but i have incorrect words as list creates already i want to use that because im using sys.argv[] to open them i have a list of words and a list of incorrect words how can i replace them instead of opening the text? cheers
jad
ive update my code above i got u into trouble by u not needing to create bad words and goodwords because i have a list of words in the txt file and a list of the misspeled words how can i use them cheers
jad
You've just about got it - you just need to change the "word not in text" line to use goodwords, instaed of text, as text is not a defined variable.
Simon Callan