tags:

views:

368

answers:

5

I have two tables, pages and revisions. Revisions has a foreign key to a page. The contents of a page is the latest entry in the revisions table for that page. The revisions are full copies of the contents, no deltas.

As an experiment, I would like to visualize the revision state of the current revision. If text is new in the current revision, don't do anything. If it is from a recent revision, give it a green background color. If it's very old, give it a red background color. In between, orange. A heat map diff of the age of the content, so to speak.

My question is: How I can extract this data from the revisions of a page? Pointers to literature would be equally useful to actual code solving this problem.

Not really relevant, but just in case: It's for a Ruby project, Ruby on Rails in fact. Here's the project, on github.

Update: here's an example test case, written in Ruby. http://pastie.org/631604

+2  A: 

One quick way to do it is to get the successive versions of the page and run them through the diff utility to get deltas, so you know what to color how. You could of course reinvent the code that goes from two complete pages and finds which bits they have in common, but it's going to be faster to reuse existing code.

redtuna
How would you pass something to `diff` that isn't stored as files? The revisions comes from the DB, and afaik `diff` can't take input from STDIN.
August Lilleaas
Hi, I think you could work with temporal files (the *diff* command inputs)
ATorras
Ah, so I would have to create tempfiles. How can I use diffs to visualize the content age, though?
August Lilleaas
+2  A: 
cetnar
Good idea, but I'd rather store it in a DB so that scaling the system is easier.
August Lilleaas
It's pretty unlikely that storing the data in a DB would be appreciably more scalable than storing it in any reasonable VCS; this is the sort of work they are specifically designed to handle, whereas a relational (I'm assuming) DB is designed to support ad-hoc querying, etc.A more likely source of problems would be integrating the DB-backed part of the app with the part that talks to the VCS. For ideas of how to manage that, you could look at projects like django-vcs http://code.google.com/p/django-vcs/
Hank Gay
Can svn/git/[other vcs] help me visualizing the content age, though?
August Lilleaas
Yes. Output of svn blame command is simple to parse and you have all data you need (revision and content).
cetnar
Isn't `svn blame` a per-line kind of thing, similar to `git blame`? In order to visualize this, I need something that doesn't care about lines. Don't I?
August Lilleaas
+1  A: 

You can use any DVCS to achieve that. I'd recommend git. It will be even better than using db.

Kane
Got any examples on how I could use Git to do this? I can't store the content in git, has to be stored in a DB. I could pass content to git, though.
August Lilleaas
You can just store both in db and git. DB will contain latest revision of the text. In git you can store file with ID from db for example, and everytime time you overwrite it issue git commit.This way you'll be able to get history/diff of every file and even line in the file.
Kane
I can't think of any git commands that visualize the content age, though. Can git actually do this?
August Lilleaas
+2  A: 

One thing. Heat implies activity or energy, so I would flip your colors around so that the most recent are red (hot) and the older text is blue/green (cooled off).

OG
Good point, thanks :)
August Lilleaas
+3  A: 

Update: [ long and slightly off-topic answer on longest-common-subsequence deleted ]

I've integrated my Hunt-McIlroy Algorithm subsequence finder with your test case, which now passes. I made various mods to your test case, see it here at pastie.org. Likewise, here is the rdiff module. Here is my svn log for why your test case was changed.

DigitalRoss
http://pastie.org/639057
DigitalRoss
I have experimented with the `Diff::LCS` gem as well, to get the diffs between the various revisions. What I haven't figured out is a way to use the diffs, though. I guess the output of your LCS class is similar to the output of the `Diff::LCS` gem?
August Lilleaas
Oh, heh, I didn't know about that. It looks like we have done something similar. I would have extended my results to your nicely written test case if I had more time, but I started on this question just a few hours before the deadline.
DigitalRoss
I guess you could always ask another question... :-)
DigitalRoss
Super-awesome! Going afk for a few days, I'll take a look at this when I get back. Thanks!
August Lilleaas