views:

1201

answers:

12

Possible Duplicates:
Which is more secure: filesystem or database?
User images - database vs. filesystem storage
store image in database or in a system file ?

I can't decide which one I should follow. Can you guys give some opinions? Should I store my images in the file-system or DB? (I would like to prevent others from stealing my images)

When you answer this question, please include comparisons of the security, performances etc.

Thanks.


Exact Duplicate: User Images: Database or filesystem storage?
Exact Duplicate: Storing images in database: Yea or nay?
Exact Duplicate: Should I store my images in the database or folders?
Exact Duplicate: Would you store binary data in database or folders?
Exact Duplicate: Store pictures as files or or the database for a web app?
Exact Duplicate: Storing a small number of images: blob or fs?
Exact Duplicate: store image in filesystem or database?

+10  A: 

Moving your images into a database and writing the code to extract the image may be more hassle than it's worth. It will all go back to the business requirements surrounding the need to protect the images, or the requirement for performance.

I'd suggest sticking to the tried and true system of storing the filepath or directory location in the DB, and keeping the files on disk. Here's why:

  • A filesystem is easier to maintain. Some thought has to be put into the structure and organization of the images. i.e. a directory for each customer, and a subdirectory for each [Attribute-X] and another subfolder for each [Attribute-Y]. Keeping too many images in one directory will end up slowing down the file access (i.e. hundreds of thousands)
  • If the idea of storing in a DB is a counter-measure against filesystem loss, (i.e. a disk goes down, or a directory is deleted by accident), then I'd counter with the suggestions that when you use source control, it's no problem to retrieve any lost/missing/delete files.
  • If you ever need to scale and move to a content distribution scenario, you'd have to move out back to the filesystem or perform a big extract to push out to the providers.
  • It also goes with the saying: "keep structured data in a database". Microsoft has an article on Managing Unstructured Data.
  • If security is an issue to be addressed, the filesystem has a whole structure with ACLs. Reinventing the wheel on security may be out of scope in the business requirements.

A large amount of discussion for this topic is also found at:

Having your images stored as varbinary or a blob of some kind (depending on your platform), I'd suggest it's more hassle than it's worth. The effort that you'll need to extend means more code that you'll have to maintain, unit test, and defend against defects.

If your environment can support SQL Server 2008, you can have the best of both worlds with their new FileStream datatype in SQL 2008.

I'd love to see studies in real-world scenarios with large user bases like Flickr or Facebook.

Again, it all goes back to your business requirements. Good luck!

p.campbell
i agree... storing in the database means an extra layer of complexity in many cases...
jle
+1 let the file system do what its designed to do!
Soviut
A: 

Saving your files to the DB will provide a some security in terms that another user would need access to the DB in order to retrieve the files, but, as far as efficiency goes, a sql query for every image loaded, leaving all the load to the server side. Do yourself a favor and find a way to protect your images inside the filesystem, they are many.

ONi
+4  A: 

It doesn't matter where you store them in terms of preventing "theft". If you deliver the bytestream to a browser, it can be intercepted and saved by anyone. There's no way to prevent that (I'm assuming you're talking about delivering the images to a browser).

If you're just talking about securing images on the machine itself, it also doesn't matter. The operating system can be locked down as securely as the database, preventing anyone from getting at the images.

In terms of performance (when presenting images to a browser), I personally think it'll be faster serving from a filesystem. You have to present the images in separate HTTP transactions anyway, which would almost certainly require multiple trips to the database. I suspect (although I have no hard data) that it would be faster to store the image URLs in the database which point to static images on the file system - then the act of getting an image is a simple file open by the web server rather than running code to access the database.

paxdiablo
+1  A: 

You're probably going to have to get a whole ton of "but the filesystem is a DB" answers. This isn't one of them.

The filesystem option depends on many factors, for example, does the server have write premissisons to the directory? (And yes, I have seen servers where apache couldn't write to DocumentRoot.)

If you want 100% cross-compatibility across platforms, then the Database option is the best way to go. It'll also let you store image-related metadata such as a user ID, the date uploaded, and even alternate versions of the same image (such as cached thumbnails).

On the down side, you need to write custom code to read images from the DB and serve them to the user, while any standard web server would just let you send the images as they are.

When it comes to the bottom line, though, you should just choose the option that fits your project, and your server configuration.

MiffTheFox
A: 

If you want your application to be scalable, do not use a file system. Store them in a database.

Jack Marchetti
I'm not going to vote you down but I think your answer is wrong. File systems can be every bit as scalable as DBMS'. I'd be interested in why you think that was the case.
paxdiablo
@Pax: Agreed. I am imagining the webfarm at Flickr reading a DB to serve all its images.
p.campbell
Perhaps I was mislead then. I'm working on major rewrite of a web application, due to scaling issues, the architect recommended to the stakeholders that it be completely rewritten [the code and data model are awful] but the big issue was scaling. His main reason was, since the images are all file based, you can't scale up with multiple servers and use load balancers.
Jack Marchetti
Maybe the location of the images is stored in a database?
Jack Marchetti
@JackM, you *can* scale up filesystem images with multiple servers - you just have to have a copy on each server. A single NFS mounted location would be as bad as a non-replicated database. In addition, having a billion images in a single directory may also cause problems but that's bad design, and no different from not indexing the images in the DB. More than likely your architects are making work for themselves :-)
paxdiablo
Sounds about right, knowing him the way i do.
Jack Marchetti
A: 

The biggest out-of-the-box advantage of a database is that it can be accessed from anywhere on the network, which is essential if you have more than one server.

If you want to access a filesystem from other machines you need to set up some kind of sharing scheme, which could add complexity and could have synchronization issues.

If you do go with storing images in the database, you can still use local (unshared) disks as caches to reduce the strain on the DB. You'd just need to query the DB for a timestamp or something to see if the file is still up-to-date (if you allow files that can change).

Thilo
+1  A: 

Store them in FileSystem, store the file path in the DB.

Of course you can make this scalable and distributed, you just need to keep the images dirs synched between them (for JackM). Or use a shared storage connected to multiple web frontend servers.

Anyway, the stealing part was covered in your other question and is basically impossible. People that can access the images will always be able (with more or less work) to save them locally ... even if it means "print-screen" and paste into photoshop and saving.

Rostol
I had the same thought about security; once the image is viewed in the browser, you can theoretically kiss it goodbye. The security problem may originally have been ACLs on the filesystem, or perhaps some kind of browser authentication issues.
p.campbell
A: 

If the issue is scalability you'll take a massive loss by moving things into the database. You can round-robin webservers via DNS but adding the overhead of both a CGI process and a database lookup to each image is madness. It also makes your database that much harder to maintain and your images that much harder to process.

As to the other part of your question, you can secure access to a file as easily as a database record, but at the end of the day as long as there is an URL that returns a file you have limited options to prevent that URL being used (at least without making cookies and/or javascript compulsory).

SpliFF
+1  A: 

For a web application I look after, we store the images in the database, but make sure they're well cached in the filesystem.

A request from one of the web server frontends for an image requires a quick memcache check to see if the image has changed in the database and, if not, serves it from the filesystem. If it has changed it fetches it from the central database and puts a copy in the filesystem.

This gives most of the advantages of storing them in the filesystem while keeping some of the advantages of database - we only have one location for all the data which makes backups easier and means we can scale across quite a few machines without issue. It also doesn't put excessive load on the database.

Colin Coghill
+1  A: 

It depends on how many images you expect to handle, and what you have to do with them. I have an application that needs to temporarily store between 100K and several million images a day. I write them in 2gb contiguous blocks to the filesystem, storing the image identifier, filename, beginning position and length in a database.

For retrieval I just keep the indices in memory, the files open read only and seek back and forth to fetch them. I could find no faster way to do it. It is far faster than finding and loading an individual file. Windows can get pretty slow once you've dumped that many individual files into the filesystem.

I suppose it could contribute to security, since the images would be somewhat difficult to retrieve without the index data.

For scalability, it would not take long to put a web service in front of it and distribute it across several machines.

R Ubben
I'd call this a hybrid solution, and it would address a need for security. How fast is the seeking through 2gb of a large file?
p.campbell
It seems quite fast. It isn't a sequential read. I'm just doing fileStream.Seek(blockOffset,SeekOrigin.Begin) and then reading the next imageLength bytes. I guess I could keep track of my current location and seek to an offset, but that would probably add more complexity than it was worth. If I wanted additional security, I could encrypt or mount a TrueCrypt volume and write the blocks there, but for a non-DMZ server, that too would be overkill.
R Ubben
I take it you never delete or modify the files once they're written? If you did, you'd end up writing your own version of a file system!
Mark Ransom
No. It is a lot easier to just delete the index entry, or if I modified one, add a new image. This is intended for temporary storage, but if I made it permanent, I would still do it that way. Speed is a lot more expensive than disk space. Besides, if this was a permanent system of record, auditors would have a problem with the fact it was possible to delete images.
R Ubben
A: 

Store files in a file server, and store primitive data in a database. While file servers (especially HTTP-based) scale well, database servers do not. Don't mix them together.

yogman
A: 

If you need to edit, manage, or otherwise maintain the images, you should store it outside the database.

Also, the filesystem has many security features that a database does not.

The database is good for storing pointers (file paths) to the actual data.

Jeff Meatball Yang