views:

112

answers:

2

I've been teaching myself Python at my new job, and really enjoying the language. I've written a short class to do some basic data manipulation, and I'm pretty confident about it.

But old habits from my structured/modular programming days are hard to break, and I know there must be a better way to write this. So, I was wondering if anyone would like to take a look at the following, and suggest some possible improvements, or put me on to a resource that could help me discover those for myself.

A quick note: The RandomItems root class was written by someone else, and I'm still wrapping my head around the itertools library. Also, this isn't the entire module - just the class I'm working on, and it's prerequisites.

What do you think?

import itertools
import urllib2
import random
import string

class RandomItems(object):
    """This is the root class for the randomizer subclasses. These
        are used to generate arbitrary content for each of the fields
        in a csv file data row. The purpose is to automatically generate
        content that can be used as functional testing fixture data.
    """
    def __iter__(self):
        while True:
            yield self.next()

    def slice(self, times):
        return itertools.islice(self, times)

class RandomWords(RandomItems):
    """Obtain a list of random real words from the internet, place them
        in an iterable list object, and provide a method for retrieving
        a subset of length 1-n, of random words from the root list.
    """
    def __init__(self):
        urls = [
            "http://dictionary-thesaurus.com/wordlists/Nouns%285,449%29.txt",
            "http://dictionary-thesaurus.com/wordlists/Verbs%284,874%29.txt",
            "http://dictionary-thesaurus.com/wordlists/Adjectives%2850%29.txt",
            "http://dictionary-thesaurus.com/wordlists/Adjectives%28929%29.txt",
            "http://dictionary-thesaurus.com/wordlists/DescriptiveActionWords%2835%29.txt",
            "http://dictionary-thesaurus.com/wordlists/WordsThatDescribe%2886%29.txt",
            "http://dictionary-thesaurus.com/wordlists/DescriptiveWords%2886%29.txt",
            "http://dictionary-thesaurus.com/wordlists/WordsFunToUse%28100%29.txt",
            "http://dictionary-thesaurus.com/wordlists/Materials%2847%29.txt",
            "http://dictionary-thesaurus.com/wordlists/NewsSubjects%28197%29.txt",
            "http://dictionary-thesaurus.com/wordlists/Skills%28341%29.txt",
            "http://dictionary-thesaurus.com/wordlists/TechnicalManualWords%281495%29.txt",
            "http://dictionary-thesaurus.com/wordlists/GRE_WordList%281264%29.txt"
        ]
        self._words = []
        for url in urls:
            urlresp = urllib2.urlopen(urllib2.Request(url))
            self._words.extend([word for word in urlresp.read().split("\r\n")])
        self._words = list(set(self._words)) # Removes duplicates
        self._words.sort() # sorts the list

    def next(self):
        """Return a single random word from the list
        """
        return random.choice(self._words)

    def get(self):
        """Return the entire list, if needed.
        """
        return self._words

    def wordcount(self):
        """Return the total number of words in the list
        """
        return len(self._words)

    def sublist(self,size=3):
        """Return a random segment of _size_ length. The default is 3 words.
        """
        segment = []
        for i in range(size):
            segment.append(self.next())
        #printable = " ".join(segment)        
        return segment

    def random_name(self):
        """Return a string-formatted list of 3 random words.
        """
        words = self.sublist()
        return "%s %s %s" % (words[0], words[1], words[2])

def main():
    """Just to see it work...
    """
    wl = RandomWords()
    print wl.wordcount()
    print wl.next()
    print wl.sublist()
    print 'Three Word Name = %s' % wl.random_name()
    #print wl.get()

if __name__ == "__main__":
    main()
+6  A: 

Here are my five cents:

  • Constructor should be called __init__.
  • You could abolish some code by using random.sample, it does what your next() and sublist() does but it's prepackaged.
  • Override __iter__ (define the method in your class) and you can get rid of RandomIter. You can read more about at it in the docs (note Py3K, some stuff may not be relevant for lower version). You could use yield for this which as you may or may not know creates a generator, thus wasting little to no memory.
  • random_name could use str.join instead. Note that you may need to convert the values if they are not guaranteed to be strings. This can be done through [str(x) for x in iterable] or in-built map.
Skurmedel
Excellent suggestions, thanks very much for taking the time. Yield was something I'd not even thought about, and I'm embarrassed to admit I'd overlooked random.sample.
Greg Gauthier
+5  A: 

First knee-jerk reaction: I would offload your hard-coded URLs into a constructor parameter passed to the class and perhaps read from configuration somewhere; this will allow for easier change without necessitating a redeploy.

The drawback of this is that consumers of the class have to know where those URLs are stored... so you could create a companion class whose only job is to know what the URLs are (i.e. in configuration, or even hard-coded) and how to get them. You could allow the consumer of your class to provide the URLs, or if they are not provided, the class could hit up the companion class for the URLs.

Randolpho
Sorry, I tend to fall back on SOLID, regardless of language. I can't think of anything Python-specific that would make your class more Pythonic.
Randolpho
Still, it's a good suggestion :)
Skurmedel
Moving the hard-coded URLs out to a separate class is something I should have remembered from my old java course. Thanks for pointing that out!
Greg Gauthier
@Greg Guathier: no problems. :)
Randolpho