With the release of database dumps, this has essentially turned into a problem of generating an activity.xml
file from the database contents. This wiki page on the codeswarm project site details the typical process of generating these files from common version control systems (most notably SVN, CVS, VSS, Mercurial, and MediaWiki). I've had a brief look at the code (specifically the convert_logs.py
file, and infact the conversion code seems truly quite simple. In fact, the format of the actual activity.log
file, which is all you need to generate the end results (i.e. cool looking videos), is very straightforward itself.
Here's an example of the activity.xml
file I generated from the sample svn_log.txt
file. I've just pasted a portion, since the entire contents is rather long.
<?xml version="1.0"?>
<file_events>
<event date="1213658962000" filename="/branches" author="(no author)" />
<event date="1213658962000" filename="/tags" author="(no author)" />
<event date="1213658962000" filename="/trunk" author="(no author)" />
<event date="1213867405000" filename="/prototype" author="michael.ogawa" />
<event date="1213867405000" filename="/trunk/prototype" author="michael.ogawa" />
<event date="1213867405000" filename="/trunk/prototype/ColorAssigner.pde" author="michael.ogawa" />
<event date="1213867405000" filename="/trunk/prototype/ColorBins.pde" author="michael.ogawa" />
<event date="1213867405000" filename="/trunk/prototype/Edge.pde" author="michael.ogawa" />
<event date="1213867405000" filename="/trunk/prototype/FileEvent.pde" author="michael.ogawa" />
<event date="1213867405000" filename="/trunk/prototype/FileNode.pde" author="michael.ogawa" />
<event date="1214050286000" filename="/trunk/prototype/code_swarm.pde" author="[email protected]" />
<event date="1214050286000" filename="/trunk/prototype/data/code_swarm-repository.xml" author="[email protected]" />
<event date="1214053719000" filename="/trunk/convert_logs" author="[email protected]" />
<event date="1214053719000" filename="/trunk/convert_logs/README" author="[email protected]" />
<event date="1214053719000" filename="/trunk/convert_logs/convert_logs.py" author="[email protected]" />
</file_events>
(The date
attribute simply seems to be a Unix timestamp.)
Now, I haven't yet investigated what the full extent of the format for this XML file is, but it would seem quite easy to generate from any suitable data source. (Indeed, it appears that these flashy videos can be generated with no more information.) Certainly, I see no reason why one would need to take the approach of mocking a VCS. The nature of StackOverflow content seems pretty well-suited to the format/codeswarm in general, so compatibility is not likely to be an issue in my opinion. Indeed, there already exists a MediaWiki converter.
So yeah, I really don't see that this should be too difficult a project. The idea of representing StackOverflow questions with codeswarm does quite intrigue me, so I am actually thinking about spending a bit of time writing a converter that takes the StackOverflow database dump (or a subset thereof) and converts to the activity.xml
format. If you haven't yet attempted anything yourself, please let know, and I would be glad to at least create a quick and dirty convert (probably in C#).
Update
Here's my code in C# that generates the activity.xml
output file that codeswarm uses. I have verified that the format is correct, but haven't managed to get around checking that the video generates correctly. (I had to install the JDK because even that wasn't on my machine.)
public class DumpConverter
{
public event Action<object, int> ProgressChanged;
public DumpConverter()
{
}
public void ConvertToLog(XmlWriter outputWriter, XmlReader postsReader)
{
outputWriter.WriteStartDocument();
outputWriter.WriteStartElement("file_events");
int numPostsRead = 0;
while (postsReader.Read())
{
switch (postsReader.NodeType)
{
case XmlNodeType.Document:
break;
case XmlNodeType.Element:
switch (postsReader.Name)
{
case "posts":
break;
case "row":
var postDate = DateTime.Parse(postsReader["CreationDate"]);
var postFileName = postsReader["Title"];
var postAuthor = postsReader["LastEditorDisplayName"];
postsReader.MoveToElement();
outputWriter.WriteStartElement("event");
outputWriter.WriteAttributeString("date",
((int)postDate.GetUnixEpoch()).ToString());
outputWriter.WriteAttributeString("filename", postFileName);
outputWriter.WriteAttributeString("author", postAuthor);
outputWriter.WriteEndElement();
if (ProgressChanged != null)
ProgressChanged(this, ++numPostsRead);
break;
}
break;
}
}
outputWriter.WriteEndElement();
outputWriter.WriteEndDocument();
}
}
And a sample program that uses the DumpConverter
class:
static void Main(string[] args)
{
if (args.Length < 1)
return;
var inputPath = args[0];
using (var outputWriter = XmlWriter.Create(Path.Combine(inputPath, "activity.xml")))
using (var postsReader = XmlReader.Create(Path.Combine(inputPath, "posts.xml")))
{
var dumpConverter = new DumpConverter();
int nodesRead = 0;
string lastStatus = string.Empty;
Console.Write("Posts converted: ");
dumpConverter.ProgressChanged += (sender, e) =>
{
Console.Write(new string('\b', lastStatus.Length));
lastStatus = (nodesRead++).ToString("#,#0");
Console.Write(lastStatus);
};
dumpConverter.ConvertToLog(outputWriter, postsReader);
Console.WriteLine();
}
}
I'll let you know if I can actually get the video rendered in codeswarm now, though once you generate the activity data (which doesn't take terribly long), it should be trivial provided that you have the correct environment and config file set up.
Hope that helps!