Real-Time Bidding in Principle
Real-time bidding is pretty easy to understand in principle. Instead of handling requests based on a predefined set of rules and priorities, it allows a company to customize the amount they will pay for a single impression, based on very specific information about the specific action or page view being considered. This has enabled a rich ecosystem to pop up where requests are enriched with extra data about the user and source of the ad request. Companies are able to leverage programmatic buying techniques to optimize around this highly specific data.
The idea is simple, and standardized protocols like OpenRTB exist to make interoperation easier. But while speaking OpenRTB is easy, speaking OpenRTB at scale is hard.
The technical challenges stem from the fact that the amount of data involved is enormous, and the acceptable latencies are tiny. In principle, a company has 100 milliseconds to aggregate the information it needs, process its rules, and make a bid. In practice, most companies try to do this in 10 milliseconds or less. And even a modest player is handling billions of these requests per day. That requires a pretty specialized architecture.
An overview of what is involved in building an adtech infrastructure can be seen below:
There are a ton of systems in play, and even if we consider only the systems directly involved in servicing requests (the colorful pieces) we have a significant engineering task at hand.
What to do?
What we recommend depends a lot on what clients are trying to do. Much of the time, we simply help them integrate better with application platforms such as AppNexus. For customers with more specific needs, such as a highly tailored custom bidder, we start by doing traffic estimation and scalability testing. First and foremost in that process is getting an appropriate high-speed data store in place, which is typically a high-performance key-value store like Aerospike, Couchbase, Redis, or in some cases where SQL real-time analytics are needed, something like MemSQL.
After that piece is in place, there is the question of what to do with the vast amount of data being generated. The first step is generally just finding a way to get it off the front-line servers reliably and archived. Some amount of real-time filtering needs to be in place, so the system can provide signals to the systems team and status reports to end users and partners.
Of course, we will eventually need to figure out what to do with all that data once it’s archived somewhere, but that will be a whole separate post.
In such a rapidly evolving industry, a single blog post can’t provide more than the most cursory overview. The first step is typically an involved discovery process where the things that make each business unique are identified and the solution crafted around that. For a description of how we might do this, check out our adtech practice page at http://www.thumbtack.net/solutions/adtech.html.