ansaurus

Question

Answer 1

A:

If I'm remembering/thinking correctly, in SQLAlchemy you normally have only one object at a time that corresponds to a given database record. This is done so that SQLAlchemy can keep your Python objects in sync with the database, and vice-versa (well, not if there are concurrent DB mutations from outside Python, but that's another story). So the problem is that, if you were to copy one of these mapped objects, you'd wind up with two distinct objects that correspond to the same database record. If you change one, then they would have different values, and the database can't match both of them at the same time.

I think what you may need to do is decide whether you want the database record to reflect the changes you make when you change an attribute of your copy. If so, then you shouldn't be copying the objects at all, you should just be reusing the same instances.

On the other hand, if you don't want the original database record to change when you update the copy, you have another choice: should the copy become a new row in the database? Or should it not be mapped to a database record at all? In the former case, you can implement the copy operation by creating a new instance of the same class and copying over the values, pretty much the same way you created the original object. This would probably be done in the __deepcopy__() method of your SQLAlchemy mapped class. In the latter case (no mapping), you would need a separate class that has all the same fields but is not mapped using SQLAlchemy. Actually, it would probably make more sense to have your SQLAlchemy-mapped class be a subclass of this non-mapped class, and only do the mapping for the subclass.

EDIT: OK, to clarify what I meant by that last point: right now you have a Student class that's used to represent your students. What I'm suggesting is that you make Student an unmapped, regular class:

class Student(object):
    def __init__(self, sid, name, allocated_proj_ref, allocated_rank):
        self.sid = sid
        self.name = name
        self.allocated_project = None
        self.allocated_rank = None

and have a subclass, something like StudentDBRecord, that will be mapped to the database.

class StudentDBRecord(Student):
    def __init__(self, student):
        super(StudentDBRecord, self).__init__(student.sid, student.name,
            student.allocated_proj_ref, student.allocated_rank)

# this call remains the same
students_table = Table('studs', metadata,
    Column('sid', Integer, primary_key=True),
    Column('name', String),
    Column('allocated_proj_ref', Integer, ForeignKey('projs.proj_id')),
    Column('allocated_rank', Integer)
)

# this changes
mapper(StudentDBRecord, students_table, properties={'proj' : relation(Project)})

Now you would implement your optimization algorithm using instances of Student, which are unmapped - so as the attributes of the Student objects change, nothing happens to the database. This means you can safely use copy or deepcopy as needed. When you're all done, you can change the Student instances to StudentDBRecord instances, something like

students = ...dict with best solution...
student_records = [StudentDBRecord(s) for s in students.itervalues()]
session.commit()

This will create mapped objects corresponding to all your students in their optimal state and commit them to the database.

EDIT 2: So maybe that doesn't work. A quick fix would be to copy the Student constructor into StudentDBRecord and make StudentDBRecord extend object instead. That is, replace the previous definition of StudentDBRecord with this:

class StudentDBRecord(object):
    def __init__(self, student):
        self.sid = student.sid
        self.name = student.name
        self.allocated_project = student.allocated_project
        self.allocated_rank = student.allocated_rank

Or if you wanted to generalize it:

class StudentDBRecord(object):
    def __init__(self, student):
        for attr in dir(student):
            if not attr.startswith('__'):
                setattr(self, attr, getattr(student, attr))

This latter definition will copy over all non-special properties of the Student to the StudentDBRecord.

David Zaslavsky 2010-06-06 06:16:00

@David: Just updated my question with an example of one of the Classes. "...if you don't want the original database record to change when you update the copy" - that's what I'm looking for. Could you describe what exactly you mean by "`you can implement the copy operation by creating a new instance of the same class and copying over the values`"? Do I have sort of define a "localised" version of `__deepcopy__()` for my SQLA-mapped class? I think I get what you mean by the last line but could you kindly clarify it a bit more concretely? Perhaps an example?

Az 2010-06-06 06:51:21

What I meant was something like `def __deepcopy__(self, memo): return Student(deepcopy(self.sid, memo), deepcopy(self.name, memo), deepcopy(self.allocated_project, memo), deepcopy(self.allocated_rank, memo))` (note that you will no longer be able to use `sid` as a primary key if you do this).

David Zaslavsky 2010-06-06 08:36:19

@David: Have updated the question. The `__deepcopy__()` method is a bit too much for me since I'm wrapping up stuff, however, I've sort of followed your "separate class-same fields-unmapped" solution. Could you explain what you mean by subclass in this case?

Az 2010-06-06 20:11:59

@Az: See if my update helps clarify things.

David Zaslavsky 2010-06-06 20:30:58

@David: Can SQLA deal with inheritance?

Az 2010-06-06 21:02:07

@Az: Sure, there's even a section in the documentation about how to map multiple classes that are related in a common hierarchy. So I don't expect that it would have any trouble doing this, where the superclass is unmapped and the subclass is. But I haven't actually tested the code so I couldn't tell you for sure.

David Zaslavsky 2010-06-06 21:57:56

@David: Okidokey, I'll give it a try and let you know if it works (and if so, then finally get that green tickmark going) :)

Az 2010-06-06 22:20:07

@David: Oh no... I got an `Error`.

Az 2010-06-07 20:02:27

@David: Tried the generalised `StudentDBRecord`. Mapped that with SQLA, did `student_records = [StudentDBRecord(s) for s in best_node[1].itervalues()]` (where `best_node[1]` is the optimal `students` dictionary) followed by `session.commit()`. Then I tried to print the contents of `StudentDBRecord` and got (drumroll)... a `0`.

Az 2010-06-08 00:30:56

@Az: Weird, even if it didn't work you shouldn't just be getting `0` when you print it. Try adding a `__str__` method to `StudentDBRecord`.

David Zaslavsky 2010-06-08 00:48:56

@David: I'm using a combination of techniques to get `deepcopy` and SQLA ORM'ing working, so many thanks for the different ideas that you've explored in your answer :)

Az 2010-06-09 03:26:11

Answer 2

+2 A:

Here is another option, but I'm not sure it's applicable to your problem:

Retrieve objects from database along with all needed relations. You can either pass lazy='joined' or lazy='subquery' to relations, or call options(eagerload(relation_property) method of query, or just access required properties to trigger their load.
Expunge object from session. Lazy loading of object properties won't be supported from this point.
Now you can safely modify object.
When you need to update the object in the database you have to merge it back into session and commit.

Update: Here is prove of concept code sample:

from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relation, eagerload

metadata  = MetaData()
Base = declarative_base(metadata=metadata, name='Base')

class Project(Base):
    __tablename__ = 'projects'
    id = Column(Integer, primary_key=True)
    name = Column(String)


class Student(Base):
    __tablename__ = 'students'
    id = Column(Integer, primary_key=True)
    project_id = Column(ForeignKey(Project.id))
    project = relation(Project,
                       cascade='save-update, expunge, merge',
                       lazy='joined')

engine = create_engine('sqlite://', echo=True)
metadata.create_all(engine)
session = sessionmaker(bind=engine)()

proj = Project(name='a')
stud = Student(project=proj)
session.add(stud)
session.commit()
session.expunge_all()
assert session.query(Project.name).all()==[('a',)]

stud = session.query(Student).first()
# Use options() method if you didn't specify lazy for relations:
#stud = session.query(Student).options(eagerload(Student.project)).first()
session.expunge(stud)

assert stud not in session
assert stud.project not in session

stud.project.name = 'b'
session.commit() # Stores nothing
assert session.query(Project.name).all()==[('a',)]

stud = session.merge(stud)
session.commit()
assert session.query(Project.name).all()==[('b',)]

Denis Otkidach 2010-06-07 17:09:40

@Denis: I'm not very experienced with SQLA... would it be possible to give me a more specific example of this method?

Az 2010-06-07 23:48:11

@Denis: I've got it working now with a combination of a mapped and an unmapped class. Not as elegant slightly easier to keep track of for me. However, I endeavour to give your solution a go (and keep it for reference) in the future. Thanks for the help :)

Az 2010-06-09 03:24:11

ansaurus

tags:

views:

answers:

Help with copy and deepcopy in Python

related questions