views:

91

answers:

3

For my project, the role of the Lecturer (defined as a class) is to offer projects to students. Project itself is also a class. I have some global dictionaries, keyed by the unique numeric id's for lecturers and projects that map to objects.

Thus for the "lecturers" dictionary (currently):

lecturer[id] = Lecturer(lec_name, lec_id, max_students)

I'm currently reading in a white-space delimited text file that has been generated from a database. I have no direct access to the database so I haven't much say on how the file is formatted. Here's a fictionalised snippet that shows how the text file is structured. Please pardon the cheesiness.

0001 001 "Miyamoto, S." "Even Newer Super Mario Bros"
0002 001 "Miyamoto, S." "Legend of Zelda: Skies of Hyrule"
0003 002 "Molyneux, P." "Project Milo"
0004 002 "Molyneux, P." "Fable III"
0005 003 "Blow, J." "Ponytail"

The structure of each line is basically proj_id, lec_id, lec_name, proj_name.

Now, I'm currently reading the relevant data into the relevant objects. Thus, proj_id is stored in class Project whereas lec_name is a class Lecturer object, et al. The Lecturer and Project classes are not currently related.

However, as I read in each line from the text file, for that line, I wish to read in the project offered by the lecturer into the Lecturer class; I'm already reading the proj_id into the Project class. I'd like to create an object in Lecturer called offered_proj which should be a set or list of the projects offered by that lecturer. Thus whenever, for a line, I read in a new project under the same lec_id, offered_proj will be updated with that project. If I wanted to get display a list of projects offered by a lecturer I'd ideally just want to use print lecturers[lec_id].offered_proj.

My Python isn't great and I'd appreciate it if someone could show me a way to do that. I'm not sure if it's better as a set or a list, as well.

Update

After the advice from Alex Martelli and Oddthinking I went back and made some changes and tried to print the results.

Here's the code snippet:

for line in csv_file:
    proj_id = int(line[0])
    lec_id = int(line[1])
    lec_name = line[2]
    proj_name = line[3]
    projects[proj_id] = Project(proj_id, proj_name)
    lecturers[lec_id] = Lecturer(lec_id, lec_name)
    if lec_id in lecturers.keys():
        lecturers[lec_id].offered_proj.add(proj_id)
    print lec_id, lecturers[lec_id].offered_proj

The print lecturers[lec_id].offered_proj line prints the following output:

001 set([0001])
001 set([0002])
002 set([0003])
002 set([0004])
003 set([0005])

It basically feels like the set is being over-written or somesuch. So if I try to print for a specific lecturer print lec_id, lecturers[001].offered_proj all I get is the last the proj_id that has been read in.

+4  A: 

set is better since you don't care about order and have no duplicate.

You can parse the file easily with the csv module (with a delimiter of ' ').

Once you have the lec_name you must check if that lecturer's already know; for that purpose, keep a dictionary from lec_name to lecturer objects (that's just another reference to the same lecturer object which you also refer to from the lecturer dictionary). On finding a lec_name that's not in that dictionary you know it's a lecturer not previously seen, so make a new lecturer object (and stick it in both dicts) in that case only, with an empty set of offered courses. Finally, just .add the course to the current lecturer's offered_proj. It's really a pretty smooth flow.

Have you tried implementing this flow? If so, what problems have you had? Can you show us the relevant code -- should be a dozen lines or so, at most?

Edit: since the OP has posted code now, I can spot the bug -- it's here:

lecturers[lec_id] = Lecturer(lec_id, lec_name)
if lec_id in lecturers.keys():
    lecturers[lec_id].offered_proj.add(proj_id)

this is unconditionally creating a new lecturer object (trampling over the old one in the lecturers dict, if any) so of course the previous set gets tossed away. This is the code you need: first check, and create only if needed! (also, minor bug, don't check in....keys(), that's horribly inefficient - just check for presence in the dict). As follows:

if lec_id in lecturers:
    thelec = lecturers[lec_id]
else:
    thelec = lecturers[lec_id] = Lecturer(lec_id, lec_name)
thelec.offered_proj.add(proj_id)

You could express this in several different ways, but I hope this is clear enough. Just for completeness, the way I would normally phrase it (to avoid two lookups into the dictionary) is as follows:

thelec = lecturers.get(lec_id)
if thelec is None:
    thelec = lecturers[lec_id] = Lecturer(lec_id, lec_name)
thelec.offered_proj.add(proj_id)
Alex Martelli
Ah thanks. I actually am using `csv.reader` with `delimiter = ' '`. I didn't want to add that because I thought the question was getting a bit too verbose.
Az
OK, so you already have what I suggested in the first paragraph;-). The second paragraph should still be useful.
Alex Martelli
Tried pasting some code in here but it kind of failed miserably. I did what Oddthinking suggested: added the set creation in the `Lecturer` class. As I read in the parsed file, I'm setting: `proj_id = int(line[0])` I perform similar assignments for the other elements. Then read in all the relevant data into the global dictionaries: `projects[proj_id] = Project(proj_id, proj_name)` and `lecturers[lec_id] = Lecturer(lec_id, lec_name)`After that I have an `if` statement that checks if the lecturer is actually in the dictionary and then it tries to add the project.
Az
The problem is that when I call `lecturers[lec_id].offered_proj.add(proj_id)` I don't get a set of projects. I get a new set every time, i.e., it doesn't seem to be adding to the original set.
Az
@Az, let me edit the answer to show where your bug is.
Alex Martelli
I figured out what I was doing wrong and answered myself since I couldn't figure out how to post the code in the comments. Thanks for the help though!
Az
PS - Thanks for the debug idea too. Will use that! The `lecturers.get(lec_id)` you used is similar to `if lec_id in lecturers` I take it? Any key differences?
Az
@Az, don't add answers or comments to your questions: edit your original questions instead. `.get` does a single look-up and therefore can be about twice as fast than `if...in` plus indexing, which does two lookups.
Alex Martelli
+1  A: 

Sets are useful when you want to guarantee you only have one instance of each item. They are also faster than a list at calculating whether an item is present in the collection.

Lists are faster at adding items, and also have an ordering.

This sounds like you would like a set. You sound like you are very close already.

in Lecturer.init, add a line:

self.offered_proj = set()

That will make an empty set.

When you read in the project, you can simply add to that set:

lecturer.offered_proj.add(project)

And you can print, just as you suggest (although you may like to pretty it up.)

Oddthinking
Thanks for the tip. I'm currently trying it out.
Az
A: 

Thanks for the help Alex and Oddthinking! I think I've figured out what was going on:

I modified the code snippet that I added to the question. Basically, every time it read the line I think it was recreating the lecturer object. Thus I put in another if statement that checks if lec_id already exists in the dictionary. If it does, then it skips the object creation and simply moves onto adding projects to the offered_proj set.

The change I made is:

if not lec_id in lecturers.keys():
    projects[proj_id] = Project(proj_id, proj_name)
lecturers[lec_id] = Lecturer(lec_id, lec_name)
lecturers[lec_id].offered_proj.add(proj_id)

I only recently discovered the concept behind if not thanks to my friend Samir.

Now I get the following output:

001 set([0001])
001 set([0001, 0002])
002 set([0003])
002 set([0003, 0004])
003 set([0005])

If I print for a chosen lec_id I get the fully updated set. Glee.

Az
Excellent news. You can abbreviate that first line to be "if lec_id not in lecturers:"
Oddthinking
Ooh, wait a moment. You are only creating the project if the lecturer is new. The second line of code shouldn't be inside the if-statement. It should always be run.
Oddthinking
Ah, great spot. I'll update my project. Thanks!
Az