views:

2015

answers:

13

I want to do a simple versioning system but i don't have ideas on how to structure my datas, and my code.

Here is a short example:

  1. User logs in
  2. User has two options when uploading a file:
    • Submit a new file
    • Submit a new version of a file

Users should be able to see the tree. (the different version) The tree can only be up to 2 levels:

|
|--File_A_0
 \--File_A_1
 \--File_A_2
 \--File_A_3
 \--File_A_4

There are also 2 types of file, a final (which is the latest approved version) and a draft version (which the latest uploaded file) The file will be physically stored on the server. Each files are owned by a user (or more) and only one group.

Edit: Groups represent a group of document, document could only be owned by ONE group at once. Users do NOT depend on groups.

Begin edit:

Here is what i did, but it is not really efficient !

id_article | relative_group_id | id_group | title | submited | date | abstract | reference | draft_version | count | status

id_draft | id_file | version | date

But it's difficult to manage, to extend. I think it's because the group paramater...

End edit

So the questions are:

  • How can i schematize my database ?
  • What kind of infos should be usefull to version this work ?
  • What kind of structure for the folders, files ?
  • What kind of tips, hints do you have to do this kind of work ?

(The application is developped with PHP and Zend Framework, database should be mysql or postgresql)

+16  A: 

You may get inspiration from there.


Concerning your comment :

As for a database structure, you may try this kind of structure (MySQL sql) :

CREATE TABLE `Users` (
       `UserID` INT NOT NULL AUTO_INCREMENT
     , `UserName` CHAR(50) NOT NULL
     , `UserLogin` CHAR(20) NOT NULL
     , PRIMARY KEY (`UserID`)
);

CREATE TABLE `Groups` (
       `GroupID` INT NOT NULL AUTO_INCREMENT
     , `GroupName` CHAR(20) NOT NULL
     , PRIMARY KEY (`GroupID`)
);

CREATE TABLE `Documents` (
       `DocID` INT NOT NULL AUTO_INCREMENT
     , `GroupID` INT NOT NULL
     , `DocName` CHAR(50) NOT NULL
     , `DocDateCreated` DATETIME NOT NULL
     , PRIMARY KEY (`DocID`)
     , INDEX (`GroupID`)
     , CONSTRAINT `FK_Documents_1` FOREIGN KEY (`GroupID`)
                  REFERENCES `Groups` (`GroupID`)
);

CREATE TABLE `Revisions` (
       `RevID` INT NOT NULL AUTO_INCREMENT
     , `DocID` INT
     , `RevUserFileName` CHAR(30) NOT NULL
     , `RevServerFilePath` CHAR(255) NOT NULL
     , `RevDateUpload` DATETIME NOT NULL
     , `RevAccepted` BOOLEAN NOT NULL
     , PRIMARY KEY (`RevID`)
     , INDEX (`DocID`)
     , CONSTRAINT `FK_Revisions_1` FOREIGN KEY (`DocID`)
                  REFERENCES `Documents` (`DocID`)
);

CREATE TABLE `M2M_UserRev` (
       `UserID` INT NOT NULL
     , `RevID` INT NOT NULL
     , INDEX (`UserID`)
     , CONSTRAINT `FK_M2M_UserRev_1` FOREIGN KEY (`UserID`)
                  REFERENCES `Users` (`UserID`)
     , INDEX (`RevID`)
     , CONSTRAINT `FK_M2M_UserRev_2` FOREIGN KEY (`RevID`)
                  REFERENCES `Revisions` (`RevID`)
);

Documents is a logical container, and Revisions contains actual links to the files. Whenever a person updates a new file, create an entry in each of these tables, the one in Revisions containing a link to the one inserted in Documents.

The table M2M_UserRev allows to associate several users to each revision of a document.

When you update a document, insert only in Revisions, with alink to the corresponding Document. To know which document to link to, you may use naming conventions, or asking the user to select the right document.

For the file system architecture of your files, it really doesn't matter. I would just rename my files to something unique before they are stored on the server, and keep the user file name in the database. Just store the files renamed in a folder anywhere, and keep the path to it in the database. This way, you know how to rename it when the user asks for it. You may as well keep the original name given by the user if you are sure it will be unique, but I wouldn't rely on it too much. You may soon see two different revisions having the same name and one overwriting the other on your file system.

subtenante
As soon as I saw the title of this question I was going to link to this!
Greg Rogers
Thanks for the link there are some usefull informations, but nothing on the database structure which with i am more insterested in.
Boris Guéry
I edited my post to be accurate on the group question.Thank you for the nice answer, i'll take a look !
Boris Guéry
A: 

Check out ProjectPier (originally ActiveCollab). It has a system similar to this and you could look at their source.

Ryan Schumacher
It's big app, i think my project is simpler.
Boris Guéry
I wasn't talking about the whole product, but specifically their version control system. Since it is open source you could look at that portion of their code.
Ryan Schumacher
Indeed, but what i mean is it's too be a bit strong for what i want to do.. i would have liked to better look at a simple system as mine will be ! But thanks anyway.
Boris Guéry
+2  A: 

For a database schema, you likely need two sets of information, files and file versions. When a new file is stored an initial version is created as well. The latest approved version would have to be stored explicitly, while the newest version can be selected from the versions table (either by finding the highest version related to the file, or the newest date if you store when they are created)

files(id,name,approved_version)
file_versions(id,fileId)

file versions could then be stored using their ids (eg., '/fileId/versionId' or '/fileId/versionId_fileName') on the server, with their original name stored in the database.

GApple
Interesting, close to what i did.But you forgot the group paramater.All files are owned by a group and one or more user.
Boris Guéry
A: 

Uploading files is to 1990-ty =) Look at Google Wave! You can just build your entire application around their 'version control' framework.

Alexander Kosenkov
+11  A: 

Database schema


To keep it exremely simple, I would choose the following database design. I'm separating the "file" (same as a filesystem file) concept from the "document" (the gerarchic group of documents) concept.

User entity:

  • userId
  • userName

Group entity:

  • groupId
  • groupName

File entity:

  • fileId (a sequence)
  • fileName (the name the user gives to the file)
  • filesystemFullPath
  • uploadTime
  • uploaderId (id of the uploader User)
  • ownerGroupId

Document entity:

  • documentId
  • parentDocumentId
  • fileId
  • versionNumber
  • creationTime
  • isApproved

Every time a new file is uploaded, a "File" record is created, and also a new "Document". If it's the first time that file is uploaded, parentDocumentId for that document would be NULL. Otherwise, the new document record would point to the first version.

The "isApproved" field (boolean) would handle the document being a draft or an approved revision.
You get the latest draft of a document simply ordering descending by version number or upload time.

Hints


From how you describe the problem, you should analyze better those aspects, before moving to database schema design:

  • which is the role of the "group" entity?
  • how are groups/users/files related?
  • what if two users of different groups try to upload the same document?
  • will you need folders? (probably you will; my solution is still valid, giving a type, "folder" or "document", to the "document" entity)

Hope this helps.

friol
Nice answer .
subtenante
Indeed, kind of answer i was looking for !
Boris Guéry
+8  A: 

Might an existing version-control solution work better than rolling your own? Subversion can be made to do most of what you want, and it's right there.

David Thornley
It could be insteresting, but how can i impletement my web project to work with svn ?
Boris Guéry
Easier than you think. There is already a web interface to SVN to retrieve files of different revisions. All you need to program is to handle the file upload and then use the command line svn client to commit to repository.
Milan Babuškov
Also, for example, there is pysvn. If you use Pylons or Django, you can do all the operations you need without shelling out or spawning.
Christopher
+4  A: 

Creating a rich data structure in a traditional relational database such as MySQL can often be difficult, and there are much better ways of going about it. When working with a path based data structure with a hierarchy I like to create a flat-file based system that uses a data-serialization format such as JSON to store information about a specific file, directory or an entire repository.

This way you can use current available tools to navigate and manipulate the structure easily, and you can read, edit and understand the structure easily. XML is good for this too - it's slightly more verbose than JSON but easy to read and good for messaging and other XML-based systems too.

A quick example. If we have a repository that has a directory and three files. Looking at it front on it will look like this:

/repo
  /folder
    code.php
  file.txt
  image.jpg

We can have a metadata folder, which contains our JSON files, hidden from the OS, at the root of each directory, which describe that directory's contents. This is how traditional versioning systems work, except they use a custom language instead of JSON.

/repo
  */.folderdata*
  /code
    */.folderdata*
    code.php
  file.txt
  image.jpg

Each .folderdata folder could contain it's own structure that we can use to organize the folder's data properly. Each .folderdata folder could then be compressed to save disk space. If we look at the .folderdata folder inside the /code directory:

*/.folderdata*
  /revisions
    code.php.r1
    code.php.r2
    code.php.r3
  folderstructure.json
  filerevisions.json

The folder structure defines the structure of our folder, where the files and folders are in relation to one another etc. This could look something like this:

{
  '.':        'code',
  '..':       'repo',
  'code.php': {
    'author_id': 11543,
    'author_name': 'Jamie Rumbelow',
    'file_hash': 'a26hb3vpq22'
    'access': 'public'
  }
}

This allows us to associate metadata about that file, check for authenticity and integrity, keep persistent data, specify file attributes and do much more. We can then keep information about specific revisions in the filerevisions.json file:

{
  'code.php': [
    1: {
      'commit': 'ah32mncnj654oidfd',
      'commit_author_id': 11543,
      'commit_author_name': 'Jamie Rumbelow',
      'commit_message': 'Made some changes to code.php',
      'additions': 2,
      'subtractions': 4
    },
    2: {
      'commit': 'ljk4klj34khn5nkk5',
      'commit_author_id': 18676,
      'commit_author_name': 'Jo Johnson',
      'commit_message': 'Fixed Jamie\'s bad code!',
      'additions': 2,
      'subtractions': 0
    },
    3: {
      'commit': '77sdnjhhh4ife943r',
      'commit_author_id': 11543,
      'commit_author_name': 'Jamie Rumbelow',
      'commit_message': 'Whoah, showstopper found and fixed',
      'additions': 8,
      'subtractions': 5
    },
  ]
}

This is a basic outline plan for a file versioning system - I like this idea and how it works, and I've used JSON in the past to great effect with rich datastructures like this. This sort of data just isn't suitable for a relational database such as MySQL - as you get more revisions and more files the database will grow bigger and bigger, this way you can stagger the revisions across multiple files, keep backups of everything, make sure you have persistent data across interfaces and platforms etc.

Hope this has given you some insight, and hopefully it'll provide some food for thought for the community too!

Jamie Rumbelow
+36  A: 

For God's sake, don't. You really don't want to go down this road.

Stop and think about the bigger picture for a moment. You want to keep earlier versions of documents, which means that at some point, somebody is going to want to see some of those earlier versions, right? And then they are going to ask, "What's the difference between version 3 and version 7"? And then they are going to say, "I want to roll back to version 3, but keep some of the changes that I put in version 5, ummm, ok?"

Version control is non-trivial, and there's no need to reinvent the wheel-- there are lots of viable version control systems out there, some of them free, even.

In the long run, it will be much easier to learn the API of one of these systems, and code a web front-end that offers your users the subset of features they are looking for (now.)

You wouldn't code a text editor for your users, would you?

Michael Dorfman
"You wouldn't code a text editor", why not? It's a nice exercise :)
hasen j
"You wouldn't code a text editor", why not? If they pay for it
subtenante
Coding a text editor is a nice exercise, and everybody should do it (and write a compiler, too) on their own dime.I wouldn't code a text editor for a customer, even if they paid for it-- because it wouldn't be a good use of my time or their money, and that's not the way to build long-term relationships. (Besides, why code something you can only sell once, if you have the option of coding something with more re-sale value?)
Michael Dorfman
You have a lot of perfectly good reasons, and I second all of them, but you are very lucky to be able to choose what you code. The fact is that sometimes, people don't have that much freedom, and code what they are told to. Besides, I happen to be doing exactly this kind of system for a DMS, where integrating a real control version system would be an overkill.
subtenante
Well, *somebody* is choosing what you code. If it's not you, then it's up to you to make the technical case to your superiors.As for document management systems-- those, too, are available off-the-shelf with APIs for customization. I'd recommend against custom-coding the engine for a one-off project; that's not a winning move in terms of cost/benefit ratio.
Michael Dorfman
A: 

It's not as simple as it looks. Read this article by Eric Sink on the implications of storage of storing these files.

Perhaps a better question would be what sort of files are being loaded, and do they lend themselves well to versioning (like text files)

Conrad
+1  A: 

Start from an existing content management system, done in PHP and MySQL if those are your requirements, such as eZ Publish, or Knowledgetree. For rapid testing of these applications, Bitnami provides quick-to-install "stacks" of these as well (WAMP-stacks on steroids).

Then you can tailor these applications to your organizations needs, and stay up-to-date with the changes upstream.

maxwellb
+1  A: 

As an alternative to my previous post, if you think a hierarchical structure would be best, you may want to use flat-file storage, and expose an API through a Web service.

The server would have its data root directory, and you can store groups (of files) in folders, with a root meta-data entry in each folder. (XML perhaps?)

Then you can use an existing revision control tool wrapped in an API, or roll your own, keeping revisions of files in a revisions folder underneath the item in the folder. Check for revisions, and do file I/O with file I/O commands. Expose the API to the Web application, or other client application, and let the server determine file permissions and user mapping through the XML files.

Migrate servers? Zip and copy. Cross platform? Zip and copy. Backup? Zip and copy.

It's the flat-file back-end that I love about Mercurial DVCS, for example.

Of course, in this little example, the .rev files, could have dates, times, compression, etc, etc, defined in the revisions.xml file. When you want to access one of these revisions, you expose an AccessFile() method, which your server application will look at the revisions.xml, and determine how to open that file, whether access is granted, etc.

So you have

DATA
| + ROOT
| | . metadata.xml
| | |
| | + REVISIONS
| | | . revisionsdata.xml
| | | . documenta.doc.00.rev
| | | . documenta.doc.01.rev
| | | . documentb.ppt.00.rev
| | | . documentb.ppt.03.rev
| | |___
| | |
| | . documenta.doc
| | . documentb.ppt
| | |
| | + GROUP_A
| | | . metadata.xml
| | | |
| | | + REVISIONS
| | | | . revisionsdata.xml
| | | | . documentc.doc.00.rev
| | | | . documentc.doc.01.rev
| | | | . documentd.ppt.00.rev
| | | | . documentd.ppt.03.rev
| | | |___
| | |
| | | . documentc.doc
| | | . documentd.ppt
| | |___
| | |
| | + GROUP_B
| | | . metadata.xml
| | |___
| |
| |___
|
|___
maxwellb
+1  A: 

I recently built a simple versioning system for some static data entities. The requirement was to have an 'Active' version and 0 or 1 'pending' versions.

In the end, my versioned entity had the following attributes relevant to versioning.

VersionNumber (int/long) ActiveVersionFlag (boolean)

Where:-

  • only 1 entity can be ActiveVersionFlag = 'Y'
  • only 1 entity can be Version number > the 'Active' version (i.e. the 'pending' version)

The kind of operations I allowed were

Clone current version.

  • Fail if there is already a version > the Versionnumber of the 'Active' version
  • Copy all of the data to the new version
  • increment the version number by one

Activate Pending Version

  • Fail if the specified version is not the 'Active' version + 1
  • find the 'Active' version and set its ActiveVersionFlag to 'N'
  • set the ActiveVersionFlag of the 'pending' version to 'Y'

Delete Pending Version

  • Delete the Pending Entity

This was reasonably successfull and my users now clone and activate all the time :)

Michael

Michael Dausmann
A: 

I think this describes the perfect system for versioning

http://tom.preston-werner.com/2009/05/19/the-git-parable.html

Ram