Amazon SimpleDB - Technical Overview

Structured storage was one of the missing pieces in Amazon's cloud services jigsaw puzzle (the other has to be the ability to host a site completely on EC2 without using dynamic DNS hacks) and Amazon is plugging that hole today with the launch of SimpleDB.

There is an Information Week article and the official site is at http://aws.amazon.com/simpledb

Here are the highlights as I see it

There were a bunch of things which caught my eye

Pricing

The pricing for storing data on SimpleDB is much higher than the costs on S3. Storing 1 GB of data on S3 for a month is going to cost you $0.15 while the same on SimpleDB is going to set you back on by $1.50. This points to Amazon using pretty different hardware for the two services.

I'm also fascinated by the idea of 'box usage'. For every query, Amazon returns the amount of 'machine time' used to execute that query. Since these queries are almost surely getting distributed over a variety of nodes, I'm curious to know how this 'machine time' is calculated.

Data Model and APIs

I love the data model for SimpleDB. I've never been a fan of relational tables and SQLs and prefer data structures where everything's just one huge hashtable. Though SimpleDB's data model is not exactly a hashtable, it is pretty close. There are several things to like here if your programming loyalty lies with the dynamic side

All these things are possible using standard databases but would require quite a bit of work. And changing table schemas once you've piled up a decent amount of data is definitely not fun.

To program against the data, you have a choice of (some very clean!)REST APIs and SOAP APIs. Here's a sample REST request and response pair from the docs


Sample Request

https://sdb.amazonaws.com/
?Action=Query
&AWSAccessKeyId=[valid access key id]
&DomainName=MyDomain
&MaxNumberOfItems=3
&NextToken=[valid next token]
&QueryExpression=%5B%27Color%27%3D%27Blue%27%5D
&SignatureVersion=1
&Timestamp=2007-06-25T15%3A03%3A09-07%3A00
&Version=2007-11-07
&Signature=2wVXB1x0NSWWETwLylZPVP%2FtqXQ%3D

Sample Response

<QueryResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07">
<QueryResult>
<ItemName>eID001</ItemName>
<ItemName>eID002</ItemName>
<ItemName>eID003</ItemName>
</QueryResult>
<ResponseMetadata>
<RequestId>c74ef8c8-77ff-4d5e-b60b-097c77c1c266</RequestId>
<BoxUsage>0.0000219907</BoxUsage>
</ResponseMetadata>
</QueryResponse>

Eventual Consistency

This is going to surprise a lot of SimpleDB users (and probably cause a lot of hard bugs). Reading data from SimpleDB immediately after a write may not reflect the latest updates. SimpleDB relaxes the 'C' in ACID and doesn't promise that you'll instantly see your updates (due to it being propagated across all the copies of your data). Amazon may not have a choice here (see CAP Conjecture) but I don't think this is going to be popular with a lot of programmers.

Dare talks about this extensively and as someone writes code for a high traffic website with lots of data flowing around, I shudder at the prospect of not relying on data not being always up to date. For SimpleDB developers, this is going to mean some extensive coding to make their apps resistant to stale data - something programmers traditionally never had to worry about.

Another possibility is that frameworks could take away the pain of doing this checking - this is definitely going to be an interesting place to watch.

Query language
Unlike Google's BigTable which eschews any and all forms of querying (probably in favor of a map-reduce type paradigm), SimpleDB supports a simple set of query operators - =, !=, <, > <=, >=, STARTS-WITH, AND, OR, NOT, INTERSECTION AND UNION. Also, queries can only execute for a maximum of 5 seconds.

There are several interesting properties here

The ecosystem and the competition
Amazon has built a good ecosystem around their services. Their services all work together (the same AWS keys can be used, the same X.509 certificate system,etc). The only thing missing is the ability to statically host a site completely on Amazon. What's even more surprising to me is that these are the sort of services that you would expect Google to release, given their much talked about infrastructure. As far as Microsoft goes, the only current service I can think of that comes close is Astoria (something which I should definitely spend more time digging into).

If Amazon does as good a job with this as they did with S3 and EC2, startups are going to love this service. Instead of having to shell out a ton of money up front and having to worry about dedicated hosting and colos, you now have a pay-as-you-go database in the sky.

Update #1

This post says that SimpleDB is built on Erlang. Interesting!

Update #2

See the Techcrunch post and the Techmeme discussion here


Archives

November 2004   January 2006   June 2006   July 2006   August 2006   September 2006   October 2006   November 2006   December 2006   January 2007   February 2007   March 2007   April 2007   May 2007   June 2007   July 2007   August 2007   September 2007   October 2007   December 2007   January 2008   February 2008   March 2008