views:

858

answers:

4
+4  Q: 

AWS SimpleDB

I am in the process of building an mobile application (iPhone/Android) and want to store the application data onto Amazon's SimpleDB, because we do not want to host our own server to provide these services. I've been going through all of the documentation and the maximum storage size of element values is 1024 bytes.

In my case we need to store 1024 up to 10K of text data.

I was hoping to find out how other projects are using SimpleDB when they have larger storage needs like our project. I read that one could store pointers to files that are then stored in S3 (file system). Not sure if that is a good solution.

In my mind I am not sure if SimpleDB is the correct solution. Could anyone comment on what that have done or provide a different way to think about this problem?

+6  A: 

There are ways to store your 10k text data but weather it will be acceptable will depend on what else you need to store and how you plan to use it.

If you need to store arbitrarily large data (especially binary data) then the S3 file pointer can be attractive. The value that SimpleDB adds in this scenario is the ability to run queries against the file metadata that you store in SimpleDB.

For text data limited to 10k I would recommend storing it directly in SimpleDB. It will easily fit in a single item but you'll have to spread it across multiple attributes. There are basically two ways to do this each with some draw backs.

One way is more flexible and search friendly but requires you to touch your data. You split your data up into chunks of about 1000 bytes and you store each chunk as an attribute value in a multi-valued attribute. There is no ordering imposed on multi-valued attributes so you have to prepend each chunk with a number for ordering (e.g. 01)

The fact that you have all the text stored in one attribute makes queries easy to do with a single attribute name in the predicate. You can add a different size text to each item anywhere from 1k to 200+k and it gets handled appropriately. But you do have to be aware that your prepended line numbers can pop positive for your queries (e.g. if you are searching for 01 every item will match that query).

The second way to store the text within SimpleDB does not require you to place arbitrary ordering data within your text chunks. You do the ordering by placing each text chunk in a different named attribute. For example you could use attribute names: desc01 desc02 ... desc10. Then you place each chunk in the appropriate attribute. You can still do full text search with both methods but the searches will be slower with this method because you will need to specify many predicates and SimpleDB will end up searching through a separate index for each attribute.

It may be easy to think of this type of work around as a hack because with databases we are used to having this type of low level detail handled for us within the database. SimpleDB is specifically designed to push this sort of thing out of the database and into the client as a means of providing availability as a first class feature.

If you found out that a relational database was splitting your text into 1k chunks to store on disk as an implementation detail it wouldn't seem like a hack. The problem is that the current state of SimpleDB clients is such that you have to implement a lot of this type of data formatting yourself. This is the type of thing that ideally will be handled for you in a smart client. There just aren't any smart clients freely available yet.

Mocky
Had a nice little answer written up and was about to submit when Mocky posted this one. Great summation, I agree completely with it.Given the speed and pricing of SimpleDB it's definitely worth a shot. Especially when you start to realize that the limitations of a traditional DB no longer apply.
Mark
Yes great answer, thank you for that. Breaking up the data will require a lot more thought and work on my part, but I feel it will be easier than hosting a database and server. thank you.
Peter Delaney
+1  A: 

If you are concerned about cost, you might find that it is cheaper to put the text in S3 and metadata with pointers in SimpleDB.

This is the technique I am looking to use. Good for start-ups.
objektivs
A: 

The upcoming release of Simple Savant (a C# persistence library for SimpleDB which I created) will support both attribute spanning as described by Mocky and full-text searches of SimpleDB data using Lucene.NET.

I realize you are probably not building your app in C#, but since your question is a top result when searching for SimpleDB and full-text indexing it seemed worth mentioning.

UPDATE: The Simple Savant release I mentioned above is now available.

Ashley Tate
That is perfect this is what I need because managing in my own code I did not want to do.
Peter Delaney
A: 

You could put the 10k text on S3, then create an attribute that has all the unique words of the 10k of text as multiple values. Then searches would be fast. No phrase searching, though.

How many values can you store in one attribute in one 'row' (name)? I looked in the docs, no answer popped out at me.

--Tom

Tom Andersen
Ok - I figured it out. To do word only searching on simpleDB, create a set of all unique words (lowercased) and load as many words as will fit into 1024 bytes per attribute. for 10k of typical english text that might amount to 3 or 4 attributes. Then store the actual text in s3, with the key stored in simpleDB. You get 256 attribute - value pairs per item with simpleDB.
Tom Andersen