views:

154

answers:

1

Hi there, back again with some more SQLAlchemy shenanigans.

Let me step through this.

My table is now set up as so:

engine = create_engine('sqlite:///:memory:', echo=False)
metadata = MetaData()
students_table = Table('studs', metadata,
    Column('sid', Integer, primary_key=True),
    Column('name', String),
    Column('preferences', Integer),
    Column('allocated_rank', Integer),
    Column('allocated_project', Integer)
)
metadata.create_all(engine)
mapper(Student, students_table)  

Fairly simple, and for the most part I've been enjoying the ability to query almost any bit of information I want provided I avoid the error cases below.

The class it is mapped from is:

class Student(object):
    def __init__(self, sid, name):
        self.sid = sid
        self.name = name
        self.preferences = collections.defaultdict(set)
        self.allocated_project = None
        self.allocated_rank = 0

def __repr__(self):
    return str(self)

def __str__(self):
    return "%s %s" %(self.sid, self.name)  

Explanation: preferences is basically a set of all the projects the student would prefer to be assigned. When the allocation algorithm kicks in, a student's allocated_project emerges from this preference set.

Now if I try to do this:

for student in students.itervalues():
    session.add(student)

session.commit()

It throws two errors, one for the allocated_project column (seen below) and a similar error for the preferences column:

sqlalchemy.exc.InterfaceError: (InterfaceError) Error binding parameter 4 
- probably unsupported type. u'INSERT INTO studs (sid, name, allocated_rank, 
allocated_project) VALUES (?, ?, ?, ?, ?, ?, ?)' 
[1101, 'Muffett,M.', 1, 888 Human-spider relationships (Supervisor id: 123)]  

If I go back into my code I find that, when I'm copying the preferences from the given text files, it actually refers to the Project class which is mapped to a dictionary, using the unique project id's (pid) as keys. Thus, as I iterate through each student via their rank and to the preferences set, it adds not a project id, but the reference to the project id from the projects dictionary.

students[sid].preferences[int(rank)].add(projects[int(pid)])

Now this is very useful to me since I can find out all I want to about a student's preferred projects without having to run another check to pull up information about the project id. The form you see in the error has the object print information passed as:

return "%s %s (Supervisor id: %s)" %(self.proj_id, self.proj_name, self.proj_sup)

My questions are:

  1. I'm trying to store an object in a database field aren't I?

  2. Would the correct way then, be copying the project information (project id, name, etc) into its own table, referenced by the unique project id? That way I can just have the project id field for one of the student tables just be an integer id and when I need more information, just join the tables? So and so forth for other tables?

  3. If the above makes sense, then how does one maintain the relationship with a column of information in one table which is a key index on another table?

  4. Does this boil down into a database design problem?

  5. Are there any other elegant ways of accomplishing this?

Apologies if this is a very long-winded question. It's rather crucial for me to solve this, so I've tried to explain as much as I can, whilst attempting to show that I'm trying (key word here sadly) to understand what could be going wrong.

+2  A: 

Do you expect that SQLAlchemy magically convert your object and collection of objects to integer value? It's impossible. SQLAlchemy can store related objects in separate tables or serialized, but it doesn't have telepathic algorithms to read your mind. So you have to describe your choice explicitly.

Answers to your questions:

  1. Yes, adding to session and then committing will store your object[s] to database.
  2. Yes, storing related objects in separate table is quite common idiom. SQLAlchemy handles quite nicely so you don't need to specify joins explicitly in most cases.
  3. There is good chapter in SQLAlchemy tutorial on this topic.
  4. Storing related objects in separate table causes no database design problems. It's idiom used in most case.
  5. Using separate table is the best way for most cases. But there is also an PickleType column type which uses BLOB to store serialized object.
Denis Otkidach
One thing I've realised is that even if my table has only two columns say, `sid` and `name`, I can still get other information such as allocated_project by using (e.g.) `student.allocated_project`. It's just that adding it as a table column plays havoc. I tried using `PickleType`, but get an error along the lines of `Objects stored with PickleType when mutable=True must implement __eq__() for reliable comparison.`
Az
In a way, the previous code was actually linking the Student and Project classes using the `projects` dictionary. However, just to keep things discrete, I've just added a `allocated_proj_ref` in my Student class that becomes a Foreign Key into the `projects_table` and as such, I can do without the `allocated_project`. However I still need to use it for my allocation algorithm.
Az
The error message for `PickleType` is correct: SQLAlchemy needs to know whether object is changed and should be updated in database. So you have to provide the `__eq__` method - a way to get this information.
Denis Otkidach
Sorry about asking this, but could you explain how I should go about adding this? I tried using `PickleType(comparator=operator.eq)` but then it threw up another error, asking about 'comparator', I believe.
Az
What is the problem with `def __eq__(...)` in the class of objects you are going to store?
Denis Otkidach