At the Web 2.0 Expo from the 22nd to the 25th
Yet another Flickr change, yet another mutiny
Flickr related change - check
User revolt with protest groups - check
A blog post from Zooomr CEO Thomas Hawk, supporting the protest - check
Sometimes, I feel really sorry for Flickr. They can't seem to make *any* change without drawing the ire of some really vocal users. Here are some incidents from the top off my head
- Flickr getting acquired by Yahoo
- Flickr moving to using Yahoo's login system
- Flickr potentially getting acquired by Microsoft
Now, I love Flickr. I think Stewart Butterfield and Caterina Fake have created something wonderful. I've gifted Pro accounts to friends and I have some valuable photos up there. But I think there's a line between 'being vocal fans' and 'harming the site's progress'. It's almost as if a lot of people have a frozen mental image of Flickr as it was when they first joined it and want to preserve that site forever, at the cost of the site growing.
"But shouldn't Flickr stick to what it does best? Photos?"
No.
Let me illustrate with a short history lesson from Microsoft's past. Microsoft was originally a languages and tools company - BillG wrote the first BASIC interpreter. If it had stuck to just doing one thing well - languages, it would have never built Windows and Office. Similarly, a decade ago, Microsoft was primarily known as a consumer company and didn't have a credible presence on the backend. SQL Server, Exchange, IIS and the Windows Server products changed all that. At each point in time, if Microsoft had stuck to what users thought it did best, it wouldn't have grown.
In my view, Flickr users are doing Flickr a disservice if they want to shoehorn into an online photo site. I would rather see Flickr evolve beyond photos into an friendly online forum where people post creations, regardless of their nature. Flickr's management seems to understand this well - the 90-second limit on the videos is a master-stroke. It stops people turning Flickr into another Youtube and forces them to some extent to post original content.
The Thomas Hawk Affair
Thomas Hawk's involvement in each of these protests makes me a bit uncomfortable since I'm a fan of his photography and enjoy reading his blog. At the end of the day, he is the CEO of a company that competes with Flickr. For him to lead every Flickr user protest and claim that there is no conflict of interest is a bit...stretched. A friend compared this to Tom Anderson (hypothetically) protesting Facebook's beacon implementation. Even though Thomas' arguments are probably being made on good faith, the fact remains that he stands to gain from Flickr users leaving to join his service.
Popfly - now using Cacheman
This has to be the first time one of my pet projects has proven remotely useful so I'm pretty pleased :-)
Slicehost
I spent a lot of my weekend moving my website from Dreamhost to Slicehost. Slicehost is a VPS provider with a platform based on Xen. This was not really due to any problem with Dreamhost (they're really good and I would recommend them instantly) but more due to how good Slicehost was. In fact, I have to say that Slicehost, at this early stage, is by far the best host I've worked with.
Shared hosts vs Virtual Private Servers vs dedicated servers
A quick primer on shared hosts vs VPS vs dedicated servers (most of you can probably skip this section - but you'll be surprised how many people confuse these).
A shared host, like the name implies, means that you get a machine that is shared. Your website will run on the same box as several other websites with some security ACLs in place to make sure you don't trample over someone else's files. Depending on the host, your access to the machine might range from just FTPing files over to a shell account where you can log in and run programs yourself. However, you'll never have root/administrator access to the box and you'll usually need to go through cPanel/some-really-ugly administrative control panel to request changes to your configuration or setting up new software. However, these security settings can't really protect against other type of bad behavior from individual websites. For example, if your website happens to be on the same machine as another website undergoing a Digg/Slashdot-effect, your performance will be affected as there is no real effective throttling mechanism. Most shared hosts will either kill the website hogging resources or perform some manual magic, like moving it to a different box.
A dedicated server is the other end of the spectrum - you get a full machine to yourself. Depending on which company you work and/or how much money you pay, this might be from a low-end provider or in a colo facility or in a big datacenter along with thousands of other boxes. You don't share the machine with anyone and you can party on it to your heart's content. However, since these require a 1:1 mapping between customer and hardware box, they're typically expensive.
A VPS falls in the middle. They offer you most of the benefits of dedicated servers (full root access on the box) without the 'bad-neighbour' problems of shared hosts. They tackle the latter problem by using some form of virtualization and use a hypervisor or a virtual machine to act as a sandbox and throttle resources. In Slicehost's case, they use Xen. In the screenshot below from my slice, you can see the custom Xen kernel in action as well as the 256mb limit imposed on my slice though the actual underlying hardware probably supports multiple GBs.
Some random notes
- I picked the base 256mb $20/month slice. I was tempted to get a beefier configuration but forced myself to hold off for now. This is also more than double what I was paying Dreamhost
- Slicehost has an awesome feel of community and sincerity around it. Their documentation and wiki are great, they know their stuff and they seem genuinely *honest*. That's hard to find in the overselling-world of web hosting.
- I installed Ubuntu 7.10 on my slice. I would really like to have installed Windows Server 2008 but I doubt that Slicehost will offer that as an option anytime soon. :-)
- I'm running lighttpd as my web server. I ran nginx for some time but switched to lighttpd when I needed to get mercurial to talk to my web server over fastcgi. Nginx rocks but I was just too lazy to figure out the spawn-fcgi magic to get it to talk with mercurial (and the cgi options seems unreliable).
- After a lot of iptables and lighttpd configuration file hacking, I realized how much I had been spoiled by MMC add-ins back in Windows land. I wish there was a way to remote-connect an configuration GUI over SSH (I'm thinking of the ability to connect to remote servers through MMC).
- I can't wait for the day when I get mod_rewrite style rules right at the first attempt. Or on the tenth attempt. On a tangential note, I like nginx's mini-language inside the configuration file.
- I'm writing a Python web app and I'm bewildered by the options for running it. mod_wsgi, mod_python, fcgi, etc. For every blog post with some graphs supporting one option, I can find another saying the exact opposite. I'm also amused at how a lot of people measure server performance under load by essentially hitting the server in a tight loop. All the interesting stress related problems only show up after a few days (as I painfully found out while writing cacheman).
- Mercurial vs Git - Git's poor support for Windows was a deal breaker for me. Now I just need to find a Mercurial<->TFS bridge :-)
Running Ubuntu on Windows Server 2008 Hyper-V
If you're trying to install Ubuntu 7.10 (either server or desktop) on a VT-enabled machine, you'll be probably get stuck at the "Loading..." screen. If you're interested in the gory details, check out this bug - the issue seems to be around emulation of real mode instructions and the graphics instructions that ISOLinux uses to boot.
The fix is simple - get the patch from this thread and patch your ISO images (the patch makes a small change to your isolinux.cfg.
Ubuntu is now purring away happily on my Win2k8 box :-)
Cacheman update (0.0.2)
There are no new features in this release but a *ton* of bug fixes. I had fun stomping out several race conditions that had crept into the code. I've also tested it on a bunch of diverse environments (single-core and multi-core, x86 and x64, Windows XP/Vista and Windows Server 2003/2008) so you shouldn't see some of the OS specific issues that were present in the last release.
I would like to thank Ayende for taking the trouble to find and report a lot of the bugs in the last release. Thanks Ayende!
Cacheman - a fast distributed hashtable for Windows
Cacheman is a fast, distributed hashtable for Windows, implemented purely in managed code. This is a personal side-project which I started and abandoned several months ago and picked up a few weekends ago. If you just want the binaries, scroll to the bottom of the post to get the raw, early bits (and I do mean raw and early!)
It all started with my fascination with memcached and how well it works for several heavy-traffic sites. I set off to create a bare bones hashtable with a set of requirements in mind
- Really fast and really scalable. I didn't have a firm numerical target in place but wanted to atleast do a few thousand requests per second. I wound up doing a lot more than that ( details in the how-stuff-works section below).
- Client agnostic i.e anyone should be able to take a look at the protocol and implement a client in their language/environment of choice. Though Cacheman can't work with current memcached clients, I've deliberately made the wire protocol similar so that I can make it compatible with some work
- Ops-friendly. Put simply, it shouldn't be a pain in the rear to deploy and maintain on hundreds of production machines. James Hamilton has written extensively on this subject - and I would be overjoyed if I get around to doing half the things he talks about.
- Windows and .NET friendly. I wanted something that felt 'native' to Windows. I don't have a good justification for this - just that I work on Windows and managed code a lot and happen to like them :)
Note that memcached meets a lot of the requirements above. My biggest reason for starting from scratch was to just see 'whether I could do it' :-).
I'll walk you through a little demonstration before digging into how it works and why certain design decisions were made.
A quick tour
If you grab the binaries below, you'll see 3 binaries - the server, a console/client and a library that you can link to your own apps.
- Server:
Run CachemanServer.exe with the /? argument to see the various arguments that you can give. By default, it'll try to bind to the first IP address and listen on port 16180 and set a maximum memory size of 100mb for the items it holds. You need to run with administrative priveleges the first time as it installs a few perf counters on first launch.

- Console:
Since it was a pain to write client code to test out the server, I wrote a bare-bones console. Run 'CachemanConsole.exe' and at the prompt, type 'connect <the-ip-address-the-server-is-listening-on>'. From there, you can do basic get/set/delete operations as well as run a few homegrown stress tests. Note that the times in the screenshot below are not representative of server performance - the first 5 ms set is more due to the client being run for the first time.
- Client library:
The console is actually a thin layer over CachemanAPI.dll which is where the meat of the client lives. Frankly, I haven't had much time to work on the client API and it needs quite a bit of work and polish before it can be used in a production system.
Dare Obasanjo has written extensively on how typical code using memcached in .NET looks like and the same pattern is applicable here too. Here's some sample code using the client library. Ignore the IPEndPoint stuff - the next release will let you write this in a nice config file and never have to deal with IP addresses and port in code.
How stuff works (with a few design detours and some pretty perf graphs)
- In terms of general architecture, Cacheman is similar to memcached as opposed to other distributed hash tables (for e.g, EHCache from the Java world is an in-memory cache with async replication to the server). The server listens on a socket, parses commands from the client and responds appropriately. The client itself is pretty dumb and is just a thin layer over some networking code. This means that Cacheman is neither a read-through nor a write-through cache at the moment.
- When you do a GET/SET/DELETE operation for a specified key, the client first needs to figure out which Cacheman server instance to talk to. To do that, the client does a quick FNV hash of the key and then mods that with the number of servers to get the server to talk to.
The disadvantage of this approach is that when you add a new server node or remove a server node, the cache needs to get repopulated. The fix for this is a consistent hashing algorithm which I haven't gotten around to implementing (and from the little I know, there are not too many of these algorithms out there)
The other choice is to use a central 'master cache server' which stores a lookup table of server nodes at which the clients can constantly poll. I'm not sure whether I like this too much as it seems to be a lot of added complexity and brings its own set of problems. - The client talks to the server using a simple network protocol. My protocol looks similar to memcached's text protocol but is stripped down and simplified. I looked at a bunch of options and my choice was influenced by a few things
- Binary protocols are a pain to debug - text protocols are simple and clean, especially if you are up at 3AM poring over a Wireshark trace, IMHO :)
- The protocol I have is small and lightweight enough that it doesn't have much parsing or network overhead.
- I wanted to make my server work with the huge set of memcached clients out there today.This is going to be a challenge since the internal implementation is vastly different.
The networking layer also takes care to not do small writes. This, along with the protocol design lets me get a perf win by setting TCP_NODELAY on my sockets. - The most interesting part of all this was the implementation of the server. Being new to scalable servers, I wrote a few naive implementations which just didn't scale. My first implementation had a model where the server had a thread dedicated per socket. This was quite poor scalability-wise and just lead to a lot of thrashing threads.
The model I have now is built around NT's IO Completion ports. Unlike the previous model, there is a M*N relationship between sockets and threads (rather than a 1:1 relationship).
When data is queued to the completion port (from the client), one of the waiting threads is woken up and it goes to work on the data. Once it is done processing (either handling the request or deciding that it needs more data), it goes back to waiting by calling GetQueuedCompletionStatus internally. This lack of thread affinity for client sockets lets me multiplex several clients to a few active threads.
On my home machine (2.4GHz Intel Core 2 with 2GB RAM), I can push the server to around 16000 requests per second (clients and server running on the same machine). I *believe* I can get it to around 25K requests on the same box with a bit of work but anything above that is going to mean some serious work.
Cacheman comes with a nice set of perf counters for you to get at these numbers anytime

However, my code isn't entirely async. When the server sends data back to the client, it blocks on the send. This was a deliberate choice - whenever I did perf tests with async sends, I saw a noticeable dip in speed (as measured in requests processed per second). My theory here is that the hit is due to the context switch between the 'request-processing' thread and the 'socket send' thread. - Internally, the cache items are stored in a giant dictionary. I plan on moving to a better model in the future as right now, I'm forced to take a lock over the entire store for any destructive action. Cache expiration is pretty naive at the moment - you have a choice of using either LRU or LFU to expire items from the cache (apart from the items which get thrown out due to living past their lifetime). I plan on adding more cache expiration algorithms in the future and looking into a generational model. But the current model is good enough to ensure that you don't run out of memory on your server :)
The Bits
There is a lot of work left to be done (a better client, wrapping the server into a NT service, making things more ops friendly, lock-free internal data structures,etc) but I wanted to try the 'release-early-release often' approach for once. Things are quite busy at work (some kick-a** Popfly features in the pipeline as usual) so only expect bug fixes over the weekend :)
Be warned - these are really early, really raw bits. Stuff will crash or not work. The next version will change everything. Demons will be pulled out of your nose. Use at your own risk! Have fun and send feedback through the comments or to mail@sriramkrishnan.com or sriramk@microsoft.com
Acknowledgements
I don't usually have an 'acknowledgements' section but I just had to have one this time. A shout out to my friends who saw very little of me these past weekends. :) And to Brad Fitzpatrick and the rest of the Memcached folks for their awesome work.
Updated 2nd March 2008 - Updated link to Cacheman_0_0_2.zip which has a ton of bug fixes
Archives
November 2004 January 2006 June 2006 July 2006 August 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 September 2007 October 2007 December 2007 January 2008 February 2008 March 2008 April 2008