views:

1159

answers:

2

Hello,

I am doing analysis on schema evolution and have really fallen short of real world database schemas that have evolved a bit and stored in version control. Mediawiki (http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/maintenance/tables.sql) seems to be a very good candidate so far. I've considered Bugzilla, but they don't seem to have .sql files in the trunk as in Mediawiki. Perl scripts have to be executed which builds the database schema/tables - so instead of getting only one file as in mediawiki(tables.sql), I now essentially have to download entire Bugzilla folder for different revisions to understand schema changes.

Question 1: I am wondering if any of you have known or worked with large database schemas that are open source, under revision control and changes large enough to make schema evolution analysis (preferrably have DDL scripts - .sql files).

Question 2: Also, have you come across any tools that provides analysis (table/column count, patch between revisions, etc.) on .sql files (DDL scripts). All the tools that I found require the .sql files to be loaded into a database before doing any analysis on them.

For both the questions, MySQL or Postgres is preferred as the database.

I aprreciate any reponse from fellow members of this community.

Thank you,

Venkatesh Mandalapa

+1  A: 

http://www.databaseanswers.org/ - contains a number of examples for database schemas.

HyperCas
A: 

Hi, check out our work on http://schemaevolution.org, there is some interesting research made at UCLA on schema evolution, we release pre-collected schema evolution histories (from mediawiki, Ensembl Genetic DB and several others), we provide statistical analysis of those cases, and there is also a tool available to automate the process of collection and analysis of the schema evolution histories.

bye,

Carlo

Carlo A. Curino, PhD

MIT - CSAIL

The bane of my existence is doing things

I know the computer could do for me.

Carlo Curino