views:

202

answers:

3

Consider this simplified application domain:

  • Criminal Investigative database
  • Person is anyone involved in an investigation
  • Report is a bit of info that is part of an investigation
  • A Report references a primary Person (the subject of an investigation)
  • A Report has accomplices who are secondarily related (and could certainly be primary in other investigations or reports
  • These classes have ids that are used to store them in a database, since their info can change over time (e.g. we might find new aliases for a person, or add persons of interest to a report)

Domain

If these are stored in some sort of database and I wish to use immutable objects, there seems to be an issue regarding state and referencing.

Supposing that I change some meta-data about a Person. Since my Person objects immutable, I might have some code like:

class Person(
    val id:UUID,
    val aliases:List[String],
    val reports:List[Report]) {

  def addAlias(name:String) = new Person(id,name :: aliases,reports)
}

So that my Person with a new alias becomes a new object, also immutable. If a Report refers to that person, but the alias was changed elsewhere in the system, my Report now refers to the "old" person, i.e. the person without the new alias.

Similarly, I might have:

class Report(val id:UUID, val content:String) {
  /** Adding more info to our report */
  def updateContent(newContent:String) = new Report(id,newContent)
}

Since these objects don't know who refers to them, it's not clear to me how to let all the "referrers" know that there is a new object available representing the most recent state.

This could be done by having all objects "refresh" from a central data store and all operations that create new, updated, objects store to the central data store, but this feels like a cheesy reimplementation of the underlying language's referencing. i.e. it would be more clear to just make these "secondary storable objects" mutable. So, if I add an alias to a Person, all referrers see the new value without doing anything.

How is this dealt with when we want to avoid mutability, or is this a case where immutability is not helpful?

+6  A: 

If X refers to Y, both are immutable, and Y changes (i.e. you replace it with an updated copy), then you have no choice but to replace X also (because it has changed, since the new X points to the new Y, not the old one).

This rapidly becomes a headache to maintain in highly interconnected data structures. You have three general approaches.

  • Forget immutability in general. Make the links mutable. Fix them as needed. Be sure you really do fix them, or you might get a memory leak (X refers to old Y, which refers to old X, which refers to older Y, etc.).
  • Don't store direct links, but rather ID codes that you can look up (e.g. a key into a hash map). You then need to handle the lookup failure case, but otherwise things are pretty robust. This is a little slower than the direct link, of course.
  • Change the entire world. If something is changed, everything that links to it must also be changed (and performing this operation simultaneously across a complex data set is tricky, but theoretically possible, or at least the mutable aspects of it can be hidden e.g. with lots of lazy vals).

Which is preferable depends on your rate of lookups and updates, I expect.

Rex Kerr
in most cases the second option is best (use ID as link, not address)
Javier
@Javier Agreed.
Rex Kerr
Isn't option 2 replicating what the language runtime is already doing via object references? What is gained by this? The language runtime probably better implements ref management than me.
davetron5000
@dave: No, it's a fundamentally different way to think about the problem. One way is, "X has a Y over there". The other is, "X has a name of something that I hope is a Y". Immutability says, "Whatever Z has now, Z will always have." If Y is immutable and is replaced then _X no longer has the right one_. But X still has its name, and maybe something else has that name now (and that thing with the same name is what you want). This decouples updates to X and Y, which is very useful if you actually have many many different things all of which use each other.
Rex Kerr
So, what is the advantage of doing that over allow the runtime to do it for me? By just having my persisted objects mutable, I get all of that basically for free.
davetron5000
+1: option #2 seems to work best. Cyclic dependencies are impossible to implement with eagerly evaluated, immutable data structures. Its much easier to store an ID and to reports a Person is linked to, then look up those reports whenever you need them.
Juliet
@dave: You only have to change one hash map when you change, say, a Report, but if you have mutable data, you have to change _every_ Person (and any other objects) that refer to it. That's not free. That's potentially a lot of logic to get right. Throw in some threads, and now you have synchronization logic as well as update logic to get right. Ick!
Rex Kerr
+3  A: 

I think you are trying to square the circle. Person is immutable, the list of Reports on a Person is part of the Person, and the list of Reports can change.

Would it be possible for an immutable Person have a reference to a mutable PersonRecord that keeps things like Reports and Aliases?

Malvolio
1) Person is not necessary immutable - e.g. a name of a person can change and application has to handle it some how. 2) there is no point to refer mutable objects from immutable. the main advantage of immutable object is referencial tranparency of expressions that contain pure functions and immutable objects. E.i. such expressions always return the same value, does not yield any changes in state and does not depend on any changes in state. It makes them reliable and thread-safe. When you refer mutable object from immutable it's no longer referentially transparent. So - no point
Alexey
If (1) is the case, his problem goes away.What you say under (2) is convincing EXCEPT consider the other proposed solution: "Don't store direct links, but rather ID".In that case, a Report wouldn't point to a Person but refer to Person 341, for which a currently valid and immutable PersonDescription exists. Acceptable, but if you use the supposedly immutable Report, it's possible a new PersonDescription would be generated and poof, your referential transparency is gone again. Maybe a transactional model would be better, but that has its problems too.
Malvolio
+2  A: 

I suggest you to read how they people deal with the problem in clojure and Akka. Read about Software transactional memory. And some of my thoughts...

The immutability exists not for the sake of itself. Immutability is abstraction. It does not "exist" in nature. World is mutable, world is permanently changing. So it's quite natural for data structures to be mutable - they describe the state of the real or simulated object at a given moment in time. And it looks like OOP rulez here. At conceptual level the problem with this attitude is that object in RAM != real object - the data can be inaccurate, it comes with delay etc

So in case of most trivial requirements you can go with everything mutable - persons, reports etc Practical problems will arise when:

  1. data structures are modified from concurrent threads
  2. users provide conficting changes for the same objects
  3. a user provide an invalid data and it should be rolled back

With naive mutable model you will quickly end up with inconsistent data and crushing system. Mutability is error prone, immutability is impossible. What you need is transactional view of the world. Within transaction program sees immutable world. And STM manages changes to be applied in consistent and thread-safe way.

Alexey