views:

216

answers:

5

Coming to Python from Java, I've been told that factories are not Pythonic. Thus, I'm looking for a the Python way to do something like the following. (I'm oversimplifying my goal so that I don't have to describe my entire program, which is very complicated).

My script will read in names of people (along with some information about them) and, from this, construct objects of type Person. The names may be repeated, and I only want one Person instance per name. These People may also belong to subclasses Man and Woman.

One way to do this would be to create a PersonFactory which would either return a newly instantiated Man or Woman or a reference to the previously instantiated Man/Woman with the same name. The other would be to create a set of all Person objects and check each time for the presence of a Person with the given name before instantiating a new object. Neither approach strikes me as Pythonic, though. The first seems a bit too cumbersome for Python (creating a whole class just to handle creation of another object? Really?) and the second will get expensive quickly, as I have a lot of names to process.

+1  A: 
class Person(object):
    # ...

class Man(Person):
    # ...

class Woman(Person):
    # ...

constructors = {
    'male': Man,
    'female': Woman,
    None: Person,
}

people = {}

# in processing loop
if person.name not in people:
    people[person.name] = constructors[person.gender]()
person_object = people[person.name]

Since Python allows you to do things like calling storing class types in a dict, you don't need a factory; you can just look up a type and instantiate it.

Amber
This calls the constructor every time and then throws away most of the results -- what an utter, total waste of resources (`setdefault` often encourages just this kind of wanton wastefulness).
Alex Martelli
True, alex; it'd be nice if setdefault could be passed a callable instead and only evaluate it if the value isn't set.
Amber
@Amber: Use a collections.defaultdict instead. This does what you are asking for - takes a function that is only evaluated if the key is missing.
Dave Kirby
@Dave: the problem with that is that it doesn't allow you to specify arguments to that function. So in this case, where we're trying to be able to check a value to determine what type to add, we don't have a way to pass in that value.
Amber
@Amber - good point. It would not be difficult to write a variant of defaultdict that took the key as a parameter - just override the `__missing__` method. I was quite surprised that they did not include that in the standard lib when they added defaultdict.
Dave Kirby
+8  A: 

I don't think factories are un-Pythonic. You don't need a whole class, though. One big difference between Java and Python is that in Python you can have code outside of classes. So you might want to create a factory function. Or you can make the factory be a class method on the Person class:

class Person:

    name_map = {}

    @classmethod
    def person_from_name(cls, name):
        if name not in cls.name_map:
            cls.name_map[name] = cls(name)
        return cls.name_map[name]

    def __init__(self, name):
        etc...

Often the same patterns are at work in Python code as in Java, but we don't make as big a deal of it. In Java, you'd have a whole new class, which would imply a whole new .java file, and you'd need to make it a singleton, etc, etc. Java seems to breed this sort of complexity. A simple class method will do, so just use it.

Ned Batchelder
I think you mean 'class method' ;)
aaronasterling
oops, yes, fixed.
Ned Batchelder
Your code as it stands won't work for subclassing - you'd want to use `= cls(name)` instead.
Amber
thanks, also fixed!
Ned Batchelder
I'd just call it `from_name` (so it's called as `Person.from_name` or `SubclassOfPerson.from_name`). But otherwise the best solution, +1
delnan
I finally see where class methods are useful.
Beau Martínez
+1  A: 

A free-standing function def PersonFactory(name, gender): is fine, though packaging it up as a classmethod, as @Ned suggests, shouldn't hurt (in this particular case it won't help much either, since the exact class of person to instantiate must vary). I think the cleanest implementation is in fact as a free-standing function, just because I prefer a classmethod to return instances of the class it's called on (rather than, of some other class) -- but this is a stylistic point that cannot be said to be sharply defined either way.

I'd code it (with some assumptions which I hope as clear, e.g. the gender is coded as M or F and if not specified is heuristically inferred from the name, &c):

def gender_from_name(name): ...

person_by_name = {}

class_by_gender = {'M': Man, 'F': Woman}

def person_factory(name, gender=None):
  p = person_by_name.get(name)
  if p is None:
    if gender is None:
      gender = gender_from_name(name)
    p = person_by_name[name] = class_by_gender[gender](name)
  return p
Alex Martelli
+1  A: 

The place to put a "no two objects with same key" registration is in __new__, like this:

class Person(object):
    person_registry = {}
    mens_names = set('Tom Dick Harry'.split())
    womens_names = set('Mary Linda Susan'.split())
    gender = "?"
    def __new__(cls, *args):
        if cls is Person:
            fname,lname = args[0].split()
            key = (lname, fname)
            if key in Person.person_registry:
                return Person.person_registry[key]

            if fname in Person.mens_names:
                return Man(*args)
            if fname in Person.womens_names:
                return Woman(*args)
        else:
            return object.__new__(cls, *args)

    def __init__(self, name):
        fname,lname = name.split()
        Person.person_registry[(lname, fname)] = self

class Man(Person):
    gender = "M"

class Woman(Person):
    gender = "W"

p1 = Person("Harry Turtledove")
print p1.__class__.__name__, p1.gender

p2 = Person("Harry Turtledove")

print p1 is p2

prints:

Man M
True

I also took a stab at your Man/Woman distinction, but I'm not thrilled with it.

Paul McGuire