views:

469

answers:

4

I have a data structure made of Jobs each containing a set of Tasks. Both Job and Task data are defined in files like these:

jobs.txt:
JA
JB
JC

tasks.txt:
JB  T2
JA  T1
JC  T1
JA  T3
JA  T2
JB  T1 

The process of creating objects is the following:
- read each job, create it and store it by id
- read task, retrieve job by id, create task, store task in the job

Once the files are read this data structure is never modified. So I would like that tasks within jobs would be stored in an immutable set. But I don't know how to do it in an efficient way. (Note: the immutable map storing jobs may be left immutable)

Here is a simplified version of the code:

class Task(val id: String) 

class Job(val id: String) {
    val tasks = collection.mutable.Set[Task]() // This sholud be immutable
}

val jobs = collection.mutable.Map[String, Job]() // This is ok to be mutable

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = new Job(line.trim)
    jobs += (job.id -> job)
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = jobs(tokens(0).trim)
    val task = new Task(job.id + "." + tokens(1).trim)
    job.tasks += task
}

Thanks in advance for every suggestion!

A: 

One option here is to have some mutable but transient configurer class along the lines of the MutableMap above but then pass this through in some immutable form to your actual class:

val jobs: immutable.Map[String, Job] = {
  val mJobs = readMutableJobs
  immutable.Map(mJobs.toSeq: _*)
}

Then of course you can implement readMutableJobs along the lines you have already coded

oxbow_lakes
Sorry I was'n clear enough: the jobs Map is ok to be mutable, it is the tasks Set within the single Job which should be immutable (I have edited my question)
Filippo Tabusso
I think it's fair to say that you could make the same approach work on the mutable/immutable tasks as it has worked on the map of jobs! For example, having a `Job` constructor take an immutable copy of the tasks as it is created
oxbow_lakes
+1  A: 

You could always delay the object creation until you have all the data read in from the file, like:

case class Task(id: String) 
case class Job(id: String, tasks: Set[Task])

import scala.collection.mutable.{Map,ListBuffer}
val jobIds = Map[String, ListBuffer[String]]()

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = line.trim
    jobIds += (job.id -> new ListBuffer[String]())
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = tokens(0).trim
    val task = job.id + "." + tokens(1).trim
    jobIds(job) += task
}

// create objects
val jobs = jobIds.map { j =>
    Job(j._1, Set() ++ j._2.map { Task(_) })
}

To deal with more fields, you could (with some effort) make a mutable version of your immutable classes, used for building. Then, convert as needed:

case class Task(id: String)
case class Job(val id: String, val tasks: Set[Task])
object Job {
    class MutableJob {
        var id: String = ""
        var tasks = collection.mutable.Set[Task]()
        def immutable = Job(id, Set() ++ tasks)
    }
    def mutable(id: String) = {
        val ret = new MutableJob
        ret.id = id
        ret
    }
}

val mutableJobs = collection.mutable.Map[String, Job.MutableJob]() 

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = Job.mutable(line.trim)
    jobs += (job.id -> job)
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = jobs(tokens(0).trim)
    val task = Task(job.id + "." + tokens(1).trim)
    job.tasks += task
}

val jobs = for ((k,v) <- mutableJobs) yield (k, v.immutable)
Mitch Blevins
Thanks. Your solution is ok for the example I posted, but in the real case both Job and Task have more fields than just ids. For example Job has also a due date (Date) and Task has a length (Int), and so on...
Filippo Tabusso
Thanks again, this was the solution I thought about when I faced the problem for the first time. However, in my opinion, it requires too much extra code that means more bugs, maintenance, ...
Filippo Tabusso
+2  A: 

The most efficient way to do this would be to read everything into mutable structures and then convert to immutable ones at the end, but this might require a lot of redundant coding for classes with a lot of fields. So instead, consider using the same pattern that the underlying collection uses: a job with a new task is a new job.

Here's an example that doesn't even bother reading the jobs list--it infers it from the task list. (This is an example that works under 2.7.x; recent versions of 2.8 use "Source.fromPath" instead of "Source.fromFile".)

object Example {
  class Task(val id: String) {
    override def toString = id
  }

  class Job(val id: String, val tasks: Set[Task]) {
    def this(id0: String, old: Option[Job], taskID: String) = {
      this(id0 , old.getOrElse(EmptyJob).tasks + new Task(taskID))
    }
    override def toString = id+" does "+tasks.toString
  }
  object EmptyJob extends Job("",Set.empty[Task]) { }

  def read(fname: String):Map[String,Job] = {
    val map = new scala.collection.mutable.HashMap[String,Job]()
    scala.io.Source.fromFile(fname).getLines.foreach(line => {
      line.split("\t") match {
        case Array(j,t) => {
          val jobID = j.trim
          val taskID = t.trim
          map += (jobID -> new Job(jobID,map.get(jobID),taskID))
        }
        case _ => /* Handle error? */
      }
    })
    new scala.collection.immutable.HashMap() ++ map
  }
}

scala> Example.read("tasks.txt")
res0: Map[String,Example.Job] = Map(JA -> JA does Set(T1, T3, T2), JB -> JB does Set(T2, T1), JC -> JC does Set(T1))

An alternate approach would read the job list (creating jobs as new Job(jobID,Set.empty[Task])), and then handle the error condition of when the task list contained an entry that wasn't in the job list. (You would still need to update the job list map every time you read in a new task.)

Rex Kerr
I like this approach. But I'd just write an `addTask` method that returns a new `Job` with the same data, plus the new task. It would change the logic a bit, but, as it is, `Job` seems to know way too much about how it is going to be initialized. :-)
Daniel
I did it this way to highlight the replacement of the old job with a new one, which seemed to me to be the key concept here. But I agree that an `addTask` somewhere would be better. There are lots of places one could argue for (should it take an `Option[Job]`, or be a closure around the mutable map?).
Rex Kerr
Thanks, I like this solution for the idea of the Job creating the new Job (by constructor or addTask method). I'm still very new to scala (I'm coming from java) and I'm not sure yet if, in a case like this, immutability is worth the cost of having many object creation since for me performance is quite important (in the real case I have more the 2 classes, with complex links between them and thousand of objects).
Filippo Tabusso
If performance is important, read into mutable structures, then switch to immutable ones. For example, you could have a `JobUnderConstruction extends Job` that has an extra `taskAccumulator` that is mutable, and a `toJob` method that returns a `Job` at the end.
Rex Kerr
+1  A: 

I did a feel changes for it to run on Scala 2.8 (mostly, fromPath instead of fromFile, and () after getLines). It may be using a few Scala 2.8 features, most notably groupBy. Probably toSet as well, but that one is easy to adapt on 2.7.

I don't have the files to test it, but I changed this stuff from val to def, and the type signatures, at least, match.

class Task(val id: String)  
class Job(val id: String, val tasks: Set[Task])

// read tasks 
val tasks = (
  for {
    line <- io.Source.fromPath("tasks.txt").getLines().toStream
    tokens = line.split("\t") 
    jobId = tokens(0).trim
    task = new Task(jobId + "." + tokens(1).trim) 
  } yield jobId -> task
).groupBy(_._1).map { case (key, value) => key -> value.map(_._2).toSet }

// read jobs 
val jobs = Map() ++ (
  for {
    line <- io.Source.fromPath("jobs.txt").getLines()
    job = new Job(line.trim, tasks(line.trim))
  } yield job.id -> job
)
Daniel