views:

50

answers:

2

The official documentation states:

Raw byte size (GB) of all item IDs + 45 bytes per item + Raw byte size (GB) of all attribute names + 45 bytes per attribute name + Raw byte size (GB) of all attribute-value pairs + 45 bytes per attribute-value pair

What is the raw size of an attribute-value pair? Is it precisely the size of the value? (I would expect so, but then why is it worded "attribute-value pair"?) Or is it the size of the attribute name plus the size of the attribute value? (In that case, there would be motivation to give your attributes really short names.)

For example, what is the size of the tiny domain below?

+---------------------------------------------------------+
| Item Name/ID | "Price" attribute | "Calories" attribute |
|--------------+-------------------+----------------------|
| "apple"      | "0000.43"         | "0046"               |
| "orange"     | "0000.70"         | "0053"               |
+---------------------------------------------------------+
+1  A: 

The attribute name is only counted once. The size of your sample domain would be calculated like this:

  • Item names: (5 + 45) + (6 + 45) = 101
  • Attribute names: (5 + 45) + (8 + 45) = 103
  • Attribute values: (7 + 45) + (4 + 45) + (7 + 45) + (4 + 45) = 202
  • Total: 406 bytes

This thread on the SimpleDB forum discusses the calculation in more detail: http://developer.amazonwebservices.com/connect/thread.jspa?threadID=23527&tstart=0&messageID=96906#96906

Ashley Tate
+2  A: 

There are two different storage sizes for each domain. The base size includes only the base data that you have stored and is used by the SimpleDB service to when enforcing size quotas (10GB per domain, 1MB response from Select). The other size number is used only for billing purposes and also includes storage used behind the scenes for indexes. All 6 of the values you need to calculate both storage numbers are available from the DomainMetadata operation.

Computing the base storage

Only three of the values are needed to calculate the base storage: ItemNamesSizeBytes, AttributeNamesSizeBytes and AttributeValuesSizeBytes. These values represent the sums of unique item name lengths, unique attribute name lengths, and all attribute value lengths. The formula for base storage is:

baseStorage = 
    ItemNamesSizeBytes + AttributeNamesSizeBytes + AttributeValuesSizeBytes

Computing the billing storage

Three additional DomainMetadata values are needed to compute the billing storage size, they are the counts: ItemCount, AttributeNameCount and AttributeValueCount. These numbers represent the counts of data you have stored that correspond to index entries. Each index entry incurs a 45 byte storage charge for billing purposes only. The formula for billing storage is:

indexStorage = 45 x (ItemCount + AttributeNameCount + AttributeValueCount)
billingStorage = baseStorage + indexStorage

Notes

Don't be confused by the "attribute-value pair" language. This is just meant to differentiate between attribute values that are the same but that are stored with different attribute names. For example, if you store the following two attribute pairs in an item: {name: "Violet", favColor: "Violet"} you will be charged for storing both values "Violet" because they are part of different attribute-value pairs. If the documentation said that you would be charged for each unique value per item, it wouldn't be accurate for this example.

Also, all data stored in SimpleDB is stored as UTF-8 encoded byte strings. All characters that encode to multiple bytes will count as multiple bytes for all purposes (DomainMetadata responses, quota enforcement and billing).

In addition to the character encoding, REST requests must be URL encoded over the wire. This "percent encoding" triples the size of various characters, e.g. the space character ' ' becomes '%20'. This encoding has no effect on storage size calculations. It is decoded on the SimpleDB side before storage.

The DomainMetadata values are sometimes served from a cache but are usually less than 24 hours old. Check the timestamp in the response to see when the values were computed. As a practical matter, this means that most of the time you won't be able to add some data and immediately see the DomainMetadata values change.

Mocky