views:

300

answers:

5

This is for a small scheduling app. I need an algorithm to efficiently compare two "schedules", find differences, and update only the data rows which have been changed, as well as entries in another table having this table as a foreign key. This is a big question, so I'll say right away I'm looking for either general advice or specific solutions.

EDIT: As suggested, I have significantly shortened the question.

In one table, I associate resources with a span of time when they are used.

I also have a second table (Table B) which uses the ID from Table A as a foreign key.

The entry from Table A corresponding to Table B will have a span of time which subsumes the span of time from Table B. Not all entries in Table A will have an entry in Table B.

I'm providing an interface for users to edit the resource schedule in Table A. They basically provide a new set of data for Table A that I need to treat as a diff from the version in the DB.

If they completely remove an object from Table A that is pointed to by Table B, I want to remove the entry from Table B as well.

So, given the following 3 sets:

  • The original objects from Table A (from the DB)
  • The original objects from Table B (from the DB)
  • The edited set of objects from Table A (from the user, so no unique IDs)

I need an algorithm that will:

  • Leave rows in Table A and Table B untouched if no changes are needed for those objects.
  • Add rows to Table A as needed.
  • Remove rows from Table A and Table B as needed.
  • Modify rows in Table A and Table B as needed.

Just sorting the objects into an arrangement where I can apply the appropriate database operations is more than adequate for a solution.

Again, please answer as specifically or generally as you like, I'm looking for advice but if someone has a complete algorithm that would just make my day. :)

EDIT: In response to lassvek, I am providing some additional detail:

Table B's items are always contained entirely within Table A items, not merely overlapping.

Importantly, Table B's items are quantized so they should fall either entirely within or entirely outside. If this doesn't happen, then I have a data integrity error that I'll have to handle separately.

For example (to use a shorthand):

Table A
ID Resource    Start         End
01 Resource A  10/6 7:00AM   10/6 11:00AM
02 Resource A  10/6 1:00PM   10/6 3:00PM

Table B
ID Table_A_ID  Start         End
01 02          10/6 1:00PM   10/6 2:00PM

So I want the following behaviours:

  • If I remove ID 02 from table A, or shorten it to 2:00PM - 3:00PM, I should remove ID 01 from Table B.
  • If I extend Table A ID 01 to where it ends at 1:00PM, these two entries should be merged together into one row, and Table B ID 01 should now point to table A ID 01.
  • If I remove 8:00AM-10:00AM from Table A ID 01, that entry should be split into two entries: One for 7:00AM-8:00AM, and a new entry (ID 03) for 10:00AM-11:00AM.
+1  A: 

You post is almost in the "too long; didnt read" category - shortening it will probably give you more feedback.

Anyway, on topic: you can try lookin into a thing called "Interval Algebra"

ADEpt
Thanks, ADEpt, I think you're right. I'll give it a shot.
Adam Bellaire
+1  A: 

As I understand you, your users can only directly affect table A. Assuming you are programming in C#, you could use a simple ADO.Net DataSet to manage modifications to table A. The TableAdapter knows to leave untouched rows alone and to handle new, modified and deleted rows appropriately.

In addition you should define a cascading delete in order to automatically remove corresponding objects in table B.

The only case that is not handled this way is if a timespan in table A is shortened s.t. it does not subsume the corresponding record in Table B anymore. You could simply check for that case in an update stored procedure or alternatively define an update-trigger on table A.

Manu
+1  A: 

It seems to me like any algorithm for this will be involve a pass through NewA, matching ResourceID, StartTime, and EndTime, and keeping track of which elements from OldA get hit. Then you have two sets of non-matching data, UnmatchedNewA and UnmatchedOldA.

The simplest way I can think of to proceed is to basically start over with these: Write all of UnmatchedNewA to the DB, transfer elements of B from UnmatchedOldA into New A keys (just generated) where possible, deleting when not. Then wipe out all of UnmatchedOldA.

If there are a lot of changes, this is certainly not an efficient way to proceed. In cases where the size of the data is not overwhelming, though, I prefer simplicity to clever optimization.


It's impossible to know whether this final suggestion makes any sense without more background, but on the off chance that you didn't think of it this way:

Instead of passing the entire A collection back and forth, could you use event listeners or something similar to update the data model only where changes ARE needed? This way, the objects being altered would be able to determine which DB operations are required on the fly.

grossvogel
Thanks, grossvogel, that was actually the approach that I was considering (wiping out old set/writing new set). It didn't seem terribly efficient, hence the question here, but there is something to be said for an approach that's easily explained. I may end up doing it this way.
Adam Bellaire
+2  A: 

I suggest you decouple your questions into two separate ones: The first should be something like: "How do I reason about resource scheduling, when representing a schedule atom as a resource with start time and end time?" Here, ADept's suggestion to use interval algebra seems fitting. Please see The Wikipedia entry 'Interval Graph' and The SUNY algorithm repository entry on scheduling. The second question is a database question: "Given an algorithm which schedules intervals and indicate whether two intervals overlap or one is contained in another, how do I use this information to manage a database in the given schema?" I believe that once the scheduling algorithm is in place, the database question will be much easier to solve. HTH, Yuval

Yuval F
+4  A: 

I have worked extensively with periods, but I'm afraid I don't understand entirely how table A and B work together, perhaps it's the word subsume that I don't understand.

Can you give some concrete examples of what you want done?

Do you mean that timespans recorded in table A contains entirely timespans in table B, like this?

|---------------- A -------------------|
    |--- B ----|      |--- B ---|

or overlaps with?

    |---------------- A -------------------|
|--- B ----|                        |--- B ---|

or the opposite way, timespans in B contains/overlaps with A?

Let's say it's the first one, where timespans in B are inside/the same as the linked timespan in table A.

Does this mean that:

* A removed A-timespan removes all the linked timespans from B
* An added A-timespan, what about this?
* A shortened A-timespan removes all the linked timespans from B that now falls outside A
* A lenghtened A-timespan, will this include all matching B-timespans now inside?

Here's an example:

|-------------- A1 --------------|    |-------- A2 --------------|
  |---- B1 ----|  |----- B2 ---|       |---- B3 ----|  |-- B4 --|

and then you lengthen A1 and shorten and move A2, so that:

|-------------- A1 ---------------------------------|  |--- A2 --|
  |---- B1 ----|  |----- B2 ---|       |---- B3 ----|  |-- B4 --|

this means that you want to modify the data like this:

1. Lengthen (update) A1
2. Shorten and move (update) A2
3. Re-link (update) B3 from A2 to A1 instead

how about this modification, A1 is lengthened, but not enough to contain B3 entirely, and A2 is moved/shortened the same way:

|-------------- A1 -----------------------------|      |--- A2 --|
  |---- B1 ----|  |----- B2 ---|       |---- B3 ----|  |-- B4 --|

Since B3 is now not entirely within either A1 or A2, remove it?

I need some concrete examples of what you want done.


Edit More questions

Ok, what about:

|------------------ A -----------------------|
  |------- B1 -------|  |------- B2 ------|
                           |---|                   <-- I want to remove this from A

What about this?

Either:

|------------------ A1 ----|   |---- A2 -----|
  |------- B1 -------|  |B3|   |--- B2 ---|

or:

|------------------ A1 ----|   |---- A2 -----|
  |------- B1 -------|

To summarize how I see it, with questions, so far:

  • You want to be able to do the following operations on A's
    • Shorten
    • Lengthen
    • Combine when they are adjacent, combining two or more into one
    • Punch holes in them by removing a period, and thus splitting it
  • B's that are still contained within an A after the above update, relink if necessary
  • B's that were contained, but are now entirely outside, delete them
  • B's that were contained, but are now partially outside, Edit: Delete these, ref data integrity
  • For all the above operations, do the least minimum work necessary to bring the data in line with the operations (instead of just removing everything and inserting anew)

I'll work on an implementation in C# that might work when I get home from work, I'll come back with more later tonight.


Edit Here's a stab at an algorithm.

  1. Optimize the new list first (ie. combine adjacent periods, etc.)
  2. "merge" this list with the master periods in the database in the following way:
    1. keep track of where in both lists (ie. new and existing) you are
    2. if the current new period is entirely before the current existing period, add it, then move to the next new period
    3. if the current new period is entirely after the current existing period, remove the existing period and all its child periods, then move to the next existing period
    4. if the two overlap, adjust the current existing period to be equal to the new period, in the following way, then move on to the next new and existing period
      1. if new period starts before existing period, simply move the start
      2. if new period starts after existing period, check if any child periods are in the difference-period, and remember them, then move the start
      3. do the same with the other end
  3. with any periods you "remembered", see if they needs to be relinked or deleted

You should create a massive set of unit tests and make sure you cover all combinations of modifications.

Lasse V. Karlsen
Hi lassevk, thanks for the response. Yes, it is the first scenario (B is totally contained within A). I'll try to add a concrete example to my request.
Adam Bellaire
lassevk, I regret that I have but one upvote to give for your well-thought out response. I'll probably try to implement this tomorrow morning!
Adam Bellaire
No worries .. I have one too :)
Learning