Infinitely Scalable Framework with AWS?

I've been developing a framework using EC2, S3, S3DFS (DFS = Distributed File System) along with SQLite to try and solve the database problem on Amazon Web Services.
S3DFS allows you to mount an S3 bucket to one or multiple EC2 instances as if it were a local filesystem. It works at the block level and has read/write caching - so is very fast. (Note: It's free to use for development, but otherwise requires a paid commercial license.)
Since SQLite uses flat files and has no database server or client, it is ideal for use on S3. SQLite only scales well to a point, so to get around its limitations I am separating each user into their own database (each database is a separate flat file). This can sometimes be a bad idea, but SQLite allows you to attach multiple databases together to essentially create a temporary master database where you can run queries across all the user databases at once. This will be handy for search indexing, site-wide stats, etc. In order to change the user databases I will have utilities to allow global scheme alterations.
So, by adding EC2 instances that are all attached to the same S3 bucket, and using round-robin dns to distribute traffic among them, you can theoretically scale a web service infinitely. Just launch a new EC2 instance and add it to the round-robin dns pool and you're done! Even that could be automated to respond dynamically to traffic.
Imagine the long-term cost-savings from only having to build your app and architecture once and not having to go through a more complex scaling process.
Certainly you should only worry about scaling after your app is up and running, but what if you didn't have to ever worry about it? That's the goal of this framework.
This is pre-alpha of course and I'm still in the development and experimentation stage, but I wanted to share with the community and get some feedback!
Update: To follow the feedback on this topic, see: http://news.ycombinator.com/comments?id=10001
Update2: I have decided to change my strategy a bit to bypass some of the limitations inherent in SQLite and S3DFS. Rather than have the user database files sit directly on S3 via S3DFS, I will have them reside on the EC2 instance and just back them up frequently to S3 via S3DFS. This will be faster, less complex, and more stable. When I get to the scaling part of the app, I'll probably use a reverse-proxy solution and divide the user database files across multiple EC2 instances. Thanks to the guys at Lunchgeeks for their help on this!
07 Apr 2007 Matt Jaynes
Question.. What kind of query times are you seeing with S3DFS? Infinite scalability is a nice notion, but when the latency for a single query is in seconds, that just might not do it. Plus, how good is SQLite when it comes to concurrency?
The problem is not the use of a flat filesystem, nor the server exposing a network interface to clients. The primary problems that face creation of a reliable fast database on S3+EC2 is network latency between you, EC2 and S3, and the lack of a viable append file writing operation in S3. S3 is really good a writing objects once, and then overwriting them if they change. Databases generally rely on the append performance of writing additional bytes to an open file descriptor as records are inserted, updated and deleted from the database. S3 is missing one of the fundamental building blocks required to build a performant database. Even if S3DFS buffers you from this problem with local caching, you now have introduced another point of failure.
[…] Nanobeepers » Infinitely Scalable Framework with AWS? seen over on Joe’s Megadata post: another option built on Amazon’s HaaS offerings (tags: S3 EC2 AWS HaaS persistence scalability) […]
Good questions :) Metrics and performance are the big unknown at this point. I hope to have some metrics soon and will post them here when I do ;)
We (WeoGeo; http://www.weogeo.com) had some technical issues in scaling AWS EC2 with round robin DNS, particularly when it was combined with software that enabled auto-scaling as a function of load. The TTL issues alone became problematic.
We created a different solution that coupled proxy serving and load balancing with EC2 statistics monitoring to provide a staple IP and auto-scaling capability. We have encapsulated this solution (WeoCEO) from our mapping efforts and have made it available to others in the EC2 environment (http://www.weoceo.com).