ansaurus

Question

Python - Converting CSV to Objects - Code Design

Answer 1

+2 A:

Looks fine to me. Good job. How often are you going to run this script? Most of your questions are moot if this is a one-off thing.

I like the way Employees.cleen_all_phone_numbers() delegates to Employee.clean_phone_number()
You really should be using an index (dictionary) here. You can index each employee by hrid when you create them in O(n) and then look them up in O(1).
- But only do this if you ever have to run the script again...
- Just get into the habit of using dictionaries. They are painless and make code easier to read. Whenever you write a method lookup_* you probably just want to index a dictionary.
not sure. I like explicitly setting state, but this is actually bad design - clean_phone_number() should do that, Employees should be responsible for their own state.

Daren Thomas 2010-06-03 07:29:22

Thanks for the quick response. It'll be run every week or so, the input file changes a bit. We can get deltas, but there's issues with those, and it's apparently easier just to re-write the whole file. Regarding point 2, what exactly did you mean here? Originally I was using dicts (see http://stackoverflow.com/questions/2901872/python-checking-for-membership-inside-nested-dict), however, I moved to a class-based design. Can you add a hashmap/dict to a class? Subclass dict? For point 3, so you're saying from a design POV, I shouldn't return anything, but should use clean_phone_number to set?

victorhooi 2010-06-03 08:01:44

your `lookup_all_supervisors` uses a nested loop to find the supervisor for each employee. The nested loop should just be a lookup in a dictionary of supervisors that you can create when reading the employees (single pass) or at a later time (in a second pass). This will bring O(n^2) down to O(n) for assigning supervisors.

Daren Thomas 2010-06-03 13:21:48

Ah, yes and regarding 3. exactly that: Your solution updates each employee with what that same employee thinks is a clean phone number. Instead, just tell the employee to clean the friggin phone number! Let the employee manage his own state - outside objects shouldn't be messing with other objects' state!

Daren Thomas 2010-06-03 13:23:10

@Daren Thomas: The issue with using a dict is that hrid, the ID number that the supervisor links to, is actually optional. I modified the code to use a dict instead of a list, and it introduced a new bug, where every user with an empty hrid field actually overrode the previous one, so we ended up missing users. Do you know of a workaround?

victorhooi 2010-06-04 02:42:39

@Daren Thomas: Ignore last comment, I'm just indexing users with missing ID numberse on their email, you won't be able to find those users on an ID lookup, but it means they'll be in the dict for when you iterate on it.

victorhooi 2010-06-04 06:41:54

Answer 2

+1 A:

you should close your files after reading them I suggest moving all compiled re's tot he top level (otherwise you compile them every call) if self.telephoneNumber is None or self.telephoneNumber == '': cen be easily rewrittent as if not self.telephoneNumber

Guard 2010-06-03 09:36:31

@Guard: Thanks for the tips. Hmm, how would I close the files, as I don't have an actual object reference to the file, it's opened as part of the csv.DictReader line? I'll move the re.compile's to instance variables on Employee, is that optimal? Or should they be module level? And yeah, I'll change the None/== line. Thanks again.

victorhooi 2010-06-04 00:04:30

change csv.DictReader(open(input_file), ...) to f = open(input_file)csv.DictReader(f, ...)close(f).class-level variables are the best, they are computed inly once when the class is constructed

Guard 2010-06-04 07:12:59

@Guard: Thanks for your help. I've already voted for your answer, but I'd like to award you an answer, but it seems I can only tick one at once - you wouldn't happen to know a way around that?

victorhooi 2010-06-08 02:18:49

you can pick only one answer, afaiknever mind, happy coding

Guard 2010-06-08 11:21:33

ansaurus

tags:

views:

answers:

Python - Converting CSV to Objects - Code Design

related questions