I work at a large university and much of my department's backup requirements are provided by central network services. However, many of the users have collections of large files such as medical imaging scans, which exceed the central storage available to them.
I am seeking to provide an improved backup solution for departmental resources and have set up a Linux server where staff can deposit these collections. However, I can forsee the storage in the server being swamped by large collections of files that are rarely accessed. I have a system in mind to deal with this but want to make sure I am not reinventing the wheel.
My concept:
- Users copy files to the server.
- Scheduled jobs keep a complete up-to-date copy of all files on a separate storage mechanism (a 1TB external drive is presently earmarked for this)
- Files that have not been accessed for sometime are cleared from the server but remain on the storage drive, keeping plenty of headroom in the live environment.
- A simple interface (probably web-based) gives users access to a list of all their files from which they can request ones they need, which are copied from the storage drive to the live server. Email notification would be sent once the files had been copied over.
This concept is based on a PACS (Picture Archiving and Communication System) that I heard about in a previous job but did not directly use. That used a similar process of "near-line" backup to give access to a huge volume of data while allowing transmission to local machines to occur at times that did not clog up other parts of the network. It is a similar principle to that used by many museums and academic libraries, where their total "data holdings" are much greater than what is presented on direct access shelving.
Is there a simple open source system available that fits my requirements? Are there other systems that use a different paradigm but which might still fit my needs?