Forget Big Data; Small Data Rules!

Posted by Michael Bagalman on August 6th, 2012 at 10:46 am

Byte for byte, Small Data packs more punch than Big Data.  Small Data doesn’t make exciting headlines, but the actionable information in most databases has a declining marginal contribution as the data volume grows.  So what’s all the Big Data hype about?

OK, don’t get me wrong:  Big Data has big value.  The mapping the human genome and the discovery of the Higgs Boson both depended on being able to process and analyze massive quantities of data.  Google’s search and Amazon’s order processing both depend on Big Data technology.  Wal-Mart has a world-class technology infrastructure for supply chain management that definitely counts as Big Data.

Many businesses benefit from Big Data, but most are better off maximizing the value of Small Data before taking on Big Data projects.  This is especially true for consumer-facing initiatives such as marketing.  To understand why, think about what Big Data means in a consumer-focused environment.

Big Data is a loosely defined term; it comprises four related elements:  the volume of data, the breadth of data, the speed of data processing, and the depth of insights generated.  How do these elements affect most businesses?

The volume of data is rarely a limiting factor.  For most companies, the limit to their number of customer records is dictated by their ability to attract customers, not the limitations of hard drives and processors.  They may benefit from the scale of search engine advertising or number of prospects profiled by digital display partners, but this is partly offset by their competitors’ having access to the same resources.

The breadth of data comprises the many data fields used to profile consumers.  To a marketer, this means better targeting and conversion.  Of course, most important consumer attributes have been available to direct marketers for a long time; Big Data just changed ease of access.  And many companies see as good, or better, ROI when buying cheap run of network CPM for their digital display advertising than paying a higher rate for more targeted banners.  Big Data hasn’t suspended the laws of supply and demand that drive advertising prices.

So what about the speed of data?  Not many companies deal with the volume that a Google or Amazon does; heavy duty processing speed isn’t usually an issue.  And competitors generally have access to the same technology, so no real competitive advantage accrues.

That leaves the ability to generate insights.  Does segment 1 prefer product A while segment 2 prefers product B?  Does offer X produce a higher conversion rate than offer Y.  Data often provides important insights into the business and marketers who find such insights have an advantage.

But when looking at differences, for example, between two consumer segments, a margin of error for a percentage has already dropped below one percentage point by the time the data volume exceeds ten thousand records.  Most differences that are big enough to matter, and trusted to remain stable, can be found with sample sizes on the order of hundreds or thousands.  Scanning for differences that score as “statistically significant” in databases with millions (or more) of records often yields effects with little practical importance or a long list of effects that includes spurious or anomalous results that can’t be separated from the real ones.

I’m not saying there isn’t great value in Big Data.  I’ve done my share of data mining on huge databases.  But the value of the insights gained depends a lot more upon asking the right questions and having a good analyst than on the volume of data.  And Small Data is less expensive and easier to acquire.

Historically, Small Data has worked wonders.  The fate of world leaders is often well predicted by polls of just a few hundred likely voters.  Blockbuster medications often start out in clinical trials of just dozens or hundreds of patients in a controlled study.  P&G and DuPont have relied on data analysis since before the digital computer.  After World War II, Ford hired almost the whole “Statistical Control” team from the Army Air Force (the Air Force was separte back then) and they revolutionized the company with pre-Big Data number crunching.  Insurance companies have been making data-driven decisions for a couple of hundred years.

Small Data packs a lot of bang for the buck.  Most of us should be taking maximum advantage of Small Data before committing to Big Data initiatives.  There’s gold in those hills too, but let’s start with the nuggets at our feet.

Leave a comment