Today, the situation with databases is messy. If you want different performance properties, you often have to switch database interfaces. Several different interfaces exist for largely historical reasons; today, they can all be used to serve a similar purpose: a key-value store.
In addition to the duplication of effort needed to maintain things across different interfaces, users don't have good control over SLAs. Durability, availability, median latency, and tail latency requirements are available in only a few broad classes. S3 is really durable and available but has high latency. Memcached is has high availability, no durability, and really short latencies. MySQL on EC2 has durability that's somewhere between Memcached and EC2, and latency that's also in the middle. But other combinations would also be useful.
The idea
There should be a storage-as-a-service provider with two distinguishing features:
- All the different classes of storage should be accessible from the same interface, probably building off an existing one.
- The customer asks for an SLA (or rather several SLAs for different tiers of storage), and the service provider will give a price for it.
With a detailed SLA indicating durability, availability, tail latency and median latency requirements, a Sufficiently Smart Database will be able to optimize for it. By optimizing for the customer's real requirements, the service provider's costs are lower, and can pass some of these savings on to customers. It should also be easier to evolve applications that use this storage service.
An aside about nice interfaces
There's been a lot of innovation in database performance and architecture recently, but there's also been innovation in interfaces. Some applications make great use of the special features of Redis, various SQL dialects, CouchDB and others to add a significant amount of value over a plain key-value store.
The best thing for these interfaces would be if they were usable on top of several SLA classes. Right now, the implementations are coupled, but if the configurable key-value store had just a few extra features (ordered indexes, triggers, transactions, etc) then it should be possible to build all these nice interfaces on top.
Some unsolved problems
There are more problems, but here's some:
- How do you take advantage of the SLA to the fullest degree?
It's not as simple as using a deadline scheduler for your disk when a single disk read might take too long--some data will have to sit in RAM, and some data will have to sit in flash. Disk and flash both have cases where the latency of a request can be unexpectedly long. And how can you evaluate reliability of existing systems to keep within that? - How do you price things?
You really want the price to accurately reflect how much things cost. But with a complicated database system, it's difficult to tease out what will happen on account of who, especially when there are at least three places where data might sit, and several customers will be sharing a cluster - How do you explain the pricing to the customer?
They don't want to know all about this fancy database, they just want to figure out the cheapest thing that works. - What's the policy when the SLA is violated?
It'd be cool if this never happened, but you have to compensate your customer somehow, but this creates a potential for abuse if you compensate them too well.
I really believe that a system like this (though maybe not in full generality) is what we'll eventually get from cloud storage providers. The question is just, who will implement it? A few possibilities:
- Google or Amazon, as an extension of their existing cloud storage offerings
- An enterprise storage company like EMC—but I have trouble believing they'll be able to set the prices low enough
- An existing database startup like CouchBase or RethinkDB
- Academia could figure out the database design—this would be a great paper
- Your startup!
How I'd implement this
Somehow, this system has to get off the ground, going from no customers and no advanced technology to lots of customers and really efficient technology. I'd work on both in parallel. At the beginning, buy some really high-performance enterprise database system with tons of flash and RAM that you run all of your customers on. Provide them with several SLAs of various prices that you think you'll eventually be able to support cost-effectively, and serve them all from your high-performance high-reliability system at a huge loss. If the customer buys a cheaper SLA, then insert delays and losses into their results so they don't start depending on the performance properties of the real implementation.
Now that momentum is starting to build up, and there has been a confused writeup in TechCrunch about you, you can get more capital to scale this hugely loss-making model up. You can also hire people to build the real system. At that point, you'll have a lot more data on real customer requirements and the properties of their workloads. You'll have the urgency that your financial statements give you to produce this system, make it work in a real way and provide improvements. And you'll be visible enough to get good people to work for you.
This post reflects my own personal opinions and not those of my current or former employer. I have no insider knowledge of any plans or lack-of-plans in this area at either company.
0 comments:
Post a Comment