ansaurus

Question

Python - Checking for membership inside nested dict

Answer 1

+1 A:

You probably will need to do some iteration to get the data. I assume you don't want an extra dict that can get out of date, so it won't be worth it trying to store everything keyed on internal ids.

Try this on for size:

def lookup_supervisor(manager_internal_id, employees):
    if manager_internal_id is not None and manager_internal_id != "":
        manager_dir_ids = [dir_id for dir_id in employees if employees[dir_id].get('internal_id') == manager_internal_id]
        assert(len(manager_dir_ids) <= 1)
        if len(manager_dir_ids) == 1:
            return manager_dir_ids[0]
    return None

def tidy_data(employees):
    for emp_data in employees.values():
        manager_dir_id = lookup_supervisor(emp_data.get('manager_internal_id'), employees)
        for (field, sup_key) in [('Email', 'mail'), ('FirstName', 'givenName'), ('Surname', 'sn')]:
            emp_data['Supervisor'+field] = (employees[manager_dir_id][sup_key] if manager_dir_id is not None else 'Supervisor Not Found')

And you're definitely right that a class is the answer for passing employees around. In fact, I'd recommend against storing the 'Supervisor' keys in the employee dict, and suggest instead getting the supervisor dict fresh whenever you need it, perhaps with a get_supervisor_data method.

Your new OO version all looks reasonable except for the changes I already mentioned and some tweaks to clean_phone_number.

def clean_phone_number(self, original_telephone_number):
    phone_re = re.compile(r'^\+(?P<intl_prefix>\d{2})\((?P<extra_zero>0?)(?P<area_code>\d)\)(?P<local_first_half>\d{4})(?P<hyph>-?)(?P<local_second_half>\d{4})')
    result = phone_re.search(original_telephone_number)
    if result is None:
        return '', "Number didn't match format. Original text is: " + original_telephone_number
    msg = ''
    if result.group('extra_zero'):
        msg += 'Extra zero in area code - ask user to remediate. '
    if result.group('hyph'):    # Note: can have both errors at once
        msg += 'Missing hyphen in local component - ask user to remediate. '
    return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), msg

You could definitely make an individual object for each employee, but seeing how you're using the data and what you need from it, I'm guessing it wouldn't have that much payoff.

Mu Mind 2010-05-25 04:45:07

@ Mu Mind: I've added more a complete sample of the code above, now in a class (although possibly badly designed) - I'm hoping there's a cleaner solution for it now? Thanks for your answer - also, I wasn't aware you could do assignments inside an if clause in Python (line 3 in your sample)?

victorhooi 2010-05-25 07:08:28

Yeah, you definitely can do assignments in an if clause. They're even accessible from outside the if block, after it executes. Python is pretty loose on scoping.

Mu Mind 2010-05-25 18:15:49

Interesting, shall have to try that =). Btw, I pasted some cleaned up code above, not sure if that affects your answer? Also, upvoted you. Thanks.

victorhooi 2010-05-26 03:06:02

Answer 2

+1 A:

My python skills are poor, so I am far too ignorant to write out what I have in mind in any kind of reasonable time. But I do know how to do OO decomposition.

Why does the Employees class to do all the work? There are several types of things that your monolithic Employees class does:

Read and write data from a file - aka serialization
Manage and access data from individual employees
Manage relationships between exmployees.

I suggest that you create a class to handle each task group listed.

Define an Employee class to keep track or employee data and handle field processing/tidying tasks.

Use the Employees class as a container for employee objects. It can handle tasks like tracking down an Employee's supervisor.

Define a virtual base class EmployeeLoader to define an interface (load, store, ?? ). Then implement a subclass for CSV file serialization. (The virtual base class is optional--I'm not sure how Python handles virtual classes, so this may not even make sense.)

So:

create an instance of EmployeeCSVLoader with a file name to work with.
The loader can then build an Employees object and parse the file.
As each record is read, a new Employee object will be created and stored in the Employees object.
Now ask the Employees object to populate supervisor links.
Iterate over the Employees object's collection of employees and ask each one to tidy itself.
Finally, let the serialization object handle updating the data file.

Why is this design worth the effort?

It makes things easier to understand. Smaller, task focused objects are easier to create clean, consistent APIs for.

If you find that you need an XML serialization format, it becomes trivial to add the new format. Subclass your virtual loader class to handle the XML parsing/generation. Now you can seamlessly move between CSV and XML formats.

In summary, use objects to simplify and structure your data. Section off common data and behaviors into separate classes. Keep each class tightly focused on a single type of ability. If your class is a collection, accessor, factory, kitchen sink, the API can never be usable: it will be too big and loaded with dissimilar groups of methods. But if your classes stay on topic, they will be easy to test, maintain, use, reuse, and extend.

daotoad 2010-05-25 09:19:58

@daotoad: You made a good point, I'm splitting off Employee into their own separate class. However, I'm still not sure of the best way to read in their attributes from the CSV file. csv.DictReader returns a dictionary, with the column names as keys. Is there a way in python to create instance attributes for each Employee based on those column names? Secondly, I'm not sure about the EmployeeLoader/virtual class part, not sure how this works in Python - does anybody else know?

victorhooi 2010-05-26 00:23:46

Not sure about the best way to approach class construction. Still working through tutorial. But you should be able to make a class method that takes a dict and uses it to set the attributes of your employee class. As for serialization, don't worry about a base class for now, and just think about the interface it would have. Then define `load` and `store` methods. You'll end up with code like `employees = EmployeeCVSLoader.load('data_file.csv');`. The load method will call the cvs loader to get each dict. Pass the dicts to the `new_from_dict` method.

daotoad 2010-05-26 02:48:06

Sorry I don't have enough knowledge to give you good code. It's frustrating for me because I see exactly the design, but I don't have the skill level to render it yet. Do you want a Perl example?

daotoad 2010-05-26 02:50:16

Well, yeah, a Perl example might be fine, and it would give me a good direction. I'm sure the other people here can make it more Pythonic =). Upvote you, btw.

victorhooi 2010-05-26 03:05:05

ansaurus

tags:

views:

answers:

Python - Checking for membership inside nested dict

related questions