One of the things that surprised me about our recent NoSQL benchmarking report was how difficult it was to do correctly. Each of the databases was designed with different assumptions and with different tradeoffs in mind. Designing a common baseline across even key-value stores turned out to be a real challenge once things like consistency and durability are taken into account. Different databases might offer the same consistency guarantee but with completely different side effects (for example Cassandra’s quorum writes versus Couchbase’s write master approach). How do we measure these against each other without using settings that prevent a database from being used the way it is supposed to? The answer is not obvious, and required considerable trial and error.
Once we had the theoretical baselines defined, getting the products to run in the manner we expected just made things all the more complicated. For example, some databases temporarily slow replication when they encounter a traffic spike. That’s a sensible design choice, but can we really equate such durability with those that don’t? What kinds of settings can help ameliorate such behavior?
Making things worse were problems in YCSB itself. By default, it simply was unable to generate the traffic numbers we needed, even using a “no-op” database driver. We made changes that allowed it to run in a distributed fashion and aggregate results. It also was unable to track and process faults in a sensible way. For example, since these databases are optimized for speed, some databases had a variety of fault conditions. Sometimes they might evict records. Sometimes they might return a result asynchronously asking for a retry. All these things needed to be incorporated into the client for each database in order to test correctly.
In the end, even though this covered a fairly narrow set of databases, what we were able to accomplish is identifying hundreds of configuration variables and their effects on performance, as well as finding and working around a significant number of bugs. That’s why I feel it’s worthwhile to check this report out. There are a lot of benchmark results out there that run against databases at or near default settings, but we hope our results can save companies weeks, if not months, of research and trial and error in turning these top-level numbers into meaningful results.
To dowload and read the report in its entirety, click on the following link: http://thumbtack.net/solutions/ThumbtackWhitePaper.html.
All this raises the question of why even bother to try to make these very different systems comparable to each other. The answer is that we are asked this question all the time by our customers, and we frequently encounter people who are using these systems for exactly the wrong use cases. The questions are very real, so establishing a baseline for answering them is a worthwhile endeavor.