Back in July in this Behavioral Insider article, Laurie Sullivan brought up what has been a vexing roadblock along the path to success in online ad targeting: how to manage the vast amounts of consumer data that can be collected by cookies and ad tags.
While there are many ways to deal with this thorny issue, the fact is it's not really a question of storage and CPUs, but one of analytics. When you think about the issue of data overload, the challenge is identifying and leveraging what is relevant to your campaign from what is dross. But this is a very subjective process: a publisher will have quite different criteria for relevance than an individual marketer
The suggestion of creating standard definitions for relevance is a very publisher-centric view. Standards are great when you're trying to sell impressions, but less so when you're a marketer trying to obtain unique audience segments. A standard criterion for relevance of an automotive segment, for example, might be all adults aged 18-35 who have visited an auto site in the past 30 days, but for an individual auto marketer, it may be that only those who have visited a specific competitor in the last five days are relevant. You can see the challenge: media sellers must package broad, "least common denominator" segments of inventory for sale, while any individual marketer can have very specific target audience definitions.
Since each seller has many buyers, each with very different ways of defining their relevant audience, media sellers are challenged to provide custom analytics, audiences and targeting for each client. The "brute force" approach is to build massive data stores, then do data mining. This works to an extent, but it's expensive, slow and inflexible. Cloud computing makes storage and processing power more available, but does not necessarily address flexibility or speed. The other option suggested in the article is to filter the data on the front end and dispose of what you don't need. But this gets back to the relevance issue: how do you know what you don't need, when you serve many clients?
Agency holding companies which are trying to create their own demand-side networks are just starting to wrestle with the problem. They want to buy inventory cheaply and at scale, create custom behavioral segments and sell them to clients, while taking the strategic value (and profit) that ad networks are enjoying today. But these efforts will be too expensive and inflexible to maintain if they take the traditional data warehouse approach. Creating many behavioral segments for each client could quickly become unwieldy in terms of data storage alone. The real challenge is being able to access the database to retrieve segments quickly enough to meet the demand of new real-time bidding platforms, which have strict requirements for millisecond response times.
But what if buyers had a flexible tool to sift through data in real time? What if that tool had intelligence at the "point of contact" so that you did not have to collect massive amounts of data to support a separate data mining exercise? What if you could define behavioral segments on the fly?
This is the Holy Grail: the ability to mine data for relevance as it is encountered, and then set the criteria for behavioral segments based on relevant data going forward. This way, an agency or marketer can create custom behavioral segments on the fly for each client, product or audience.
A toolset to accomplish this would have to be able to handle three kinds of data in order to mine for relevance: log data, the client's own data and third-party sources. It needs to bring them together for analysis in a fairly seamless fashion. Once you've done that and understand what defines value, you should be able to set the behavioral and demographic filters that process log files as they come in, identifying people who meet criteria you defined for that segment. For example, you would want to be able to define audience segments in terms of multiple behaviors and discrete demographics, and recognize any user meeting those criteria, regardless of other characteristics.
Rather than try to store all that data, you would want a very flexible tool that puts intelligence at the front end of the transaction. Put in database terms, you want a smart ETL tool (ETL = extract, transform and load). Data filtering technology is part of the "extract" process. But it's the "transform" part that can be truly powerful, and it's a part many people miss. It's what digests the raw data to turn into something useful. It is also what makes it possible to combine data sources on the fly, so that third-party data can be leveraged in real-time systems.
Given today's technology, there's no reason for marketers to feel overwhelmed by the sheer volume of consumer data, throw up their hands and cry "uncle," and bring in expensive outside help to help them manage it. They can instead turn that storehouse into a competitive advantage by using optimization tools with built-in intelligence that collect only relevant data on the fly, based on criteria (behavioral, demographic, IP) they establish. Thus, they can tap the daunting "Mt. Data" to create powerful, hyper-targeted segments that deliver conversion results.