views:

165

answers:

2

Hi there,

I think I tried to ask for far too much in my previous question so apologies for that. Let me lay out my situation in as simple a manner as I can this time.

Basically, I've got a bunch of dictionaries that reference my objects, which are in turn mapped using SQLAlchemy. All fine with me. However, I want to make iterative changes to the contents of those dictionaries. The problem is that doing so will change the objects they reference---and using copy.copy() does no good since it only copies the references contained within the dictionary. Thus even if copied something, when I try to, say print the contents of the dictionary, I'll only get the latest updated values for the object.

This is why I wanted to use copy.deepcopy() but that does not work with SQLAlchemy. Now I'm in a dilemma since I need to copy certain attributes of my object before making said iterative changes.

In summary, I need to use SQLAlchemy and at the same time make sure I can have a copy of my object attributes when making changes so I don't change the referenced object itself.

Any advice, help, suggestions, etc.?


Edit: Have added some code.

class Student(object):
    def __init__(self, sid, name, allocated_proj_ref, allocated_rank):
        self.sid = sid
        self.name = name
        self.allocated_proj_ref = None
        self.allocated_rank = None

students_table = Table('studs', metadata,
    Column('sid', Integer, primary_key=True),
    Column('name', String),
    Column('allocated_proj_ref', Integer, ForeignKey('projs.proj_id')),
    Column('allocated_rank', Integer)
)

mapper(Student, students_table, properties={'proj' : relation(Project)})

students = {}

students[sid] = Student(sid, name, allocated_project, allocated_rank)

Thus, the attributes that I will be changing are the allocated_proj_ref and allocated_rank attributes. The students_table is keyed using the unique student ID (sid).


Question

I'd want to persist the attributes I change above -- I mean, that's basically why I decided to use SQLA. However, the mapped object will be changing, which is not recommended. Thus, if I make the changes to doppelgänger, unmapped object... can I take those changes and update the fields/table for the mapped object.

In a sense I'm following David's secondary solution where I create another version of the Class that isn't mapped.


I tried using the StudentDBRecord solution mentioned below but got an error!

File "Main.py", line 25, in <module>
    prefsTableFile = 'Database/prefs-table.txt')
File "/XXXX/DataReader.py", line 158, in readData
readProjectsFile(projectsFile)
File "/XXXX/DataReader.py", line 66, in readProjectsFile
supervisors[ee_id] = Supervisor(ee_id, name, original_quota, loading_limit)
File "<string>", line 4, in __init__
raise exc.UnmappedClassError(class_)
sqlalchemy.orm.exc.UnmappedClassError: Class 'ProjectParties.Student' is not mapped

Does this mean that Student must be mapped?


Health warning!

Someone pointed out a really good additional issue here. See, even if I'm calling copy.deepcopy() on a non-mapped object, in this case, let's assume it's the students dictionary I've defined above, deepcopy makes a copy of everything. My allocated_proj_ref is actually a Project object, and I've got a corresponding projects dictionary for that.

So I deepcopy both students and projects -- which I am -- he says I'll have cases where the students's allocated_proj_ref attribute will have issues with matching with instances in the projects dictionary.

Thus, I take it that I'll have to redefine/override (that's what it's called isn't it?) deepcopy in each Class using def __deecopy__(self, memo): or something like that?


I'd I'd like to override __deepcopy__ such that it ignores all the SQLA stuff (which are <class 'sqlalchemy.util.symbol'> and <class 'sqlalchemy.orm.state.InstanceState'>) but copy everything else that's part of the a mapped class.

Any suggestions, please?

A: 

If I'm remembering/thinking correctly, in SQLAlchemy you normally have only one object at a time that corresponds to a given database record. This is done so that SQLAlchemy can keep your Python objects in sync with the database, and vice-versa (well, not if there are concurrent DB mutations from outside Python, but that's another story). So the problem is that, if you were to copy one of these mapped objects, you'd wind up with two distinct objects that correspond to the same database record. If you change one, then they would have different values, and the database can't match both of them at the same time.

I think what you may need to do is decide whether you want the database record to reflect the changes you make when you change an attribute of your copy. If so, then you shouldn't be copying the objects at all, you should just be reusing the same instances.

On the other hand, if you don't want the original database record to change when you update the copy, you have another choice: should the copy become a new row in the database? Or should it not be mapped to a database record at all? In the former case, you can implement the copy operation by creating a new instance of the same class and copying over the values, pretty much the same way you created the original object. This would probably be done in the __deepcopy__() method of your SQLAlchemy mapped class. In the latter case (no mapping), you would need a separate class that has all the same fields but is not mapped using SQLAlchemy. Actually, it would probably make more sense to have your SQLAlchemy-mapped class be a subclass of this non-mapped class, and only do the mapping for the subclass.

EDIT: OK, to clarify what I meant by that last point: right now you have a Student class that's used to represent your students. What I'm suggesting is that you make Student an unmapped, regular class:

class Student(object):
    def __init__(self, sid, name, allocated_proj_ref, allocated_rank):
        self.sid = sid
        self.name = name
        self.allocated_project = None
        self.allocated_rank = None

and have a subclass, something like StudentDBRecord, that will be mapped to the database.

class StudentDBRecord(Student):
    def __init__(self, student):
        super(StudentDBRecord, self).__init__(student.sid, student.name,
            student.allocated_proj_ref, student.allocated_rank)

# this call remains the same
students_table = Table('studs', metadata,
    Column('sid', Integer, primary_key=True),
    Column('name', String),
    Column('allocated_proj_ref', Integer, ForeignKey('projs.proj_id')),
    Column('allocated_rank', Integer)
)

# this changes
mapper(StudentDBRecord, students_table, properties={'proj' : relation(Project)})

Now you would implement your optimization algorithm using instances of Student, which are unmapped - so as the attributes of the Student objects change, nothing happens to the database. This means you can safely use copy or deepcopy as needed. When you're all done, you can change the Student instances to StudentDBRecord instances, something like

students = ...dict with best solution...
student_records = [StudentDBRecord(s) for s in students.itervalues()]
session.commit()

This will create mapped objects corresponding to all your students in their optimal state and commit them to the database.

EDIT 2: So maybe that doesn't work. A quick fix would be to copy the Student constructor into StudentDBRecord and make StudentDBRecord extend object instead. That is, replace the previous definition of StudentDBRecord with this:

class StudentDBRecord(object):
    def __init__(self, student):
        self.sid = student.sid
        self.name = student.name
        self.allocated_project = student.allocated_project
        self.allocated_rank = student.allocated_rank

Or if you wanted to generalize it:

class StudentDBRecord(object):
    def __init__(self, student):
        for attr in dir(student):
            if not attr.startswith('__'):
                setattr(self, attr, getattr(student, attr))

This latter definition will copy over all non-special properties of the Student to the StudentDBRecord.

David Zaslavsky
@David: Just updated my question with an example of one of the Classes. "...if you don't want the original database record to change when you update the copy" - that's what I'm looking for. Could you describe what exactly you mean by "`you can implement the copy operation by creating a new instance of the same class and copying over the values`"? Do I have sort of define a "localised" version of `__deepcopy__()` for my SQLA-mapped class? I think I get what you mean by the last line but could you kindly clarify it a bit more concretely? Perhaps an example?
Az
What I meant was something like `def __deepcopy__(self, memo): return Student(deepcopy(self.sid, memo), deepcopy(self.name, memo), deepcopy(self.allocated_project, memo), deepcopy(self.allocated_rank, memo))` (note that you will no longer be able to use `sid` as a primary key if you do this).
David Zaslavsky
@David: Have updated the question. The `__deepcopy__()` method is a bit too much for me since I'm wrapping up stuff, however, I've sort of followed your "separate class-same fields-unmapped" solution. Could you explain what you mean by subclass in this case?
Az
@Az: See if my update helps clarify things.
David Zaslavsky
@David: Can SQLA deal with inheritance?
Az
@Az: Sure, there's even a section in the documentation about how to map multiple classes that are related in a common hierarchy. So I don't expect that it would have any trouble doing this, where the superclass is unmapped and the subclass is. But I haven't actually tested the code so I couldn't tell you for sure.
David Zaslavsky
@David: Okidokey, I'll give it a try and let you know if it works (and if so, then finally get that green tickmark going) :)
Az
@David: Oh no... I got an `Error`.
Az
@David: Tried the generalised `StudentDBRecord`. Mapped that with SQLA, did `student_records = [StudentDBRecord(s) for s in best_node[1].itervalues()]` (where `best_node[1]` is the optimal `students` dictionary) followed by `session.commit()`. Then I tried to print the contents of `StudentDBRecord` and got (drumroll)... a `0`.
Az
@Az: Weird, even if it didn't work you shouldn't just be getting `0` when you print it. Try adding a `__str__` method to `StudentDBRecord`.
David Zaslavsky
@David: I'm using a combination of techniques to get `deepcopy` and SQLA ORM'ing working, so many thanks for the different ideas that you've explored in your answer :)
Az
+2  A: 

Here is another option, but I'm not sure it's applicable to your problem:

  1. Retrieve objects from database along with all needed relations. You can either pass lazy='joined' or lazy='subquery' to relations, or call options(eagerload(relation_property) method of query, or just access required properties to trigger their load.
  2. Expunge object from session. Lazy loading of object properties won't be supported from this point.
  3. Now you can safely modify object.
  4. When you need to update the object in the database you have to merge it back into session and commit.

Update: Here is prove of concept code sample:

from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relation, eagerload

metadata  = MetaData()
Base = declarative_base(metadata=metadata, name='Base')

class Project(Base):
    __tablename__ = 'projects'
    id = Column(Integer, primary_key=True)
    name = Column(String)


class Student(Base):
    __tablename__ = 'students'
    id = Column(Integer, primary_key=True)
    project_id = Column(ForeignKey(Project.id))
    project = relation(Project,
                       cascade='save-update, expunge, merge',
                       lazy='joined')

engine = create_engine('sqlite://', echo=True)
metadata.create_all(engine)
session = sessionmaker(bind=engine)()

proj = Project(name='a')
stud = Student(project=proj)
session.add(stud)
session.commit()
session.expunge_all()
assert session.query(Project.name).all()==[('a',)]

stud = session.query(Student).first()
# Use options() method if you didn't specify lazy for relations:
#stud = session.query(Student).options(eagerload(Student.project)).first()
session.expunge(stud)

assert stud not in session
assert stud.project not in session

stud.project.name = 'b'
session.commit() # Stores nothing
assert session.query(Project.name).all()==[('a',)]

stud = session.merge(stud)
session.commit()
assert session.query(Project.name).all()==[('b',)]
Denis Otkidach
@Denis: I'm not very experienced with SQLA... would it be possible to give me a more specific example of this method?
Az
@Denis: I've got it working now with a combination of a mapped and an unmapped class. Not as elegant slightly easier to keep track of for me. However, I endeavour to give your solution a go (and keep it for reference) in the future. Thanks for the help :)
Az