views:

1078

answers:

6

Hello,

I have a simple text file with several thousands of words, each in its own line, e.g.

aardvark
hello
piper

I use the following code to load the words into a set (I need the list of words to test membership, so set is the data structure I chose):

my_set = set(open('filename.txt'))

The above code produces a set with the following entries (each word is followed by a space and new-line character:

("aardvark \n", "hello \n", "piper \n")

What's the simplest way to load the file into a set but get rid of the space and \n?

Thanks

+2  A: 
my_set = set(map(str.strip, open('filename.txt')))
RichieHindle
This solution will properly remove any spaces and newline characters by applying the str.strip method to each line of filename.txt
Wesley
+13  A: 

The strip() method of strings removes whitespace from both ends.

set(line.strip() for line in open('filename.txt'))
Paul Hankin
This is superior to the set solution if you have a large file, because map will load the entire file into memory as a list of lines, which will then be discarded (itertools.imap can fix that, though).
Ryan Ginstrom
@Ryan: that is true for Python <= 2.6, but in 3.0 map returns a generator.
Stephan202
+1  A: 

To remove only the right hand spaces.

set(map(str.rstrip, open('filename.txt')))
Unknown
If the file size was fairly large this method would be potentially faster. It avoids an extra comparison with isspace() to the left side of each string.
John T
Yes, the author only specified spaces on the right hand side, so it made sense to do rstrip instead of strip or split.
Unknown
+2  A: 

if your words doesn't contain space i.e. one word per line or may be you can have multiple words per line just use

words=  set(open('filename.txt').read().split())
Anurag Uniyal
So this would work if I have one word per line and also if I have multiple words per line? (assuming if I have a line like "hello bye" I want "hello" and "bye" to be two separate words in the set)
Roee Adler
yes it will basically split on space, new line tabs etcso you can have all words in single lineor in multiple line or mixed
Anurag Uniyal
and don't worry abt loading whole file in memory unless you have very very big file which I doubtfew MBs is perfect and this will be fastest
Anurag Uniyal
A: 
with open("filename.txt") as f:
    mySet = map(str.rstrip, f)

If you want to use this in Python 2.5, you need

from __future__ import with_statement
Matt G
I think that here you have syntax error: set(map(str.rstrip('\n') str, f)
mtasic
Thanks for catching that, I've corrected it.
Matt G
A: 
with open("filename.txt") as f:
    s = set([line.rstrip('\n') for line in f])
mtasic
you don't need list comprehension there
SilentGhost