We are so awash in data that we seek simple ways to find meaning in it all. But simplicity is not a virtue if it undermines your ability to derive a realistic understanding of your subject. To illustrate, let’s look at the way social media has been reported on during the presidential election.
During the primaries earlier this year, it became apparent that the media was going to tell the social story quantitatively. Hence, candidates were judged on their numbers of Twitter followers, Facebook likes, and Klout scores. More recently, coverage of the two conventions attempted to quantify enthusiasm through the lens of the number of tweets generated during each three-day infomercial, which drew a rebuke from Stephen Colbert: “These numbers are out there and it’s the media’s duty to report them without the liberal filter of meaning something.”
At Taykey, we have the technology to look not only at the volume of conversation – tweets, status updates, blog posts and more – but to analyze the sentiment associated with each. We can understand a little better how to interpret unstructured online conversations. Consider the following graph:
This shows the volume of conversation during Tuesday night’s debate from 9pm to 11pm. During the debate we collected between five and ten thousand data points per minute – a very solid representation of the much higher number of total conversations that we’ll leave the individual platforms to report – by our estimate more than 1% of online conversation was focused on the debate. The red area shows conversations about Romney while the blue shows conversation about Obama. Anything that mentioned both was counted for each. If you’re scoring based on volume alone, you must reach the conclusion that the debate was a win for Romney.
But the contents of those conversations are quite material if you want to understand the effect of the debate on the people who went online to talk about it. Taykey’s natural language processing technology and machine learning algorithms analyze the user-generated conversations we collect by parsing grammar and definition to enable a fine-grained understanding of the sentiment carried by each sentence. We scored our data and sorted into buckets of positive, negative and neutral. Anything with an ambiguous meaning went into the neutral bucket.
In the following chart, we can see that for the most part, right up until the closing minutes, both candidates had nearly the same percentage of positive conversations.
But from the very beginning, the percentage of negative conversations accruing to Gov. Romney was much higher, and remained so with only brief exceptions. The Governor’s lowest sentiment score, at about 9:22pm, correlates to his highest volume of the night. (That was just after Gov. Romney’s first aggressive action towards President Obama – when he said “You’ll get your chance in a moment. I’m still speaking.”) One might surmise – our data doesn’t tell us – that much of the negative volume about Romney was from angry Obama partisans.
At the end of the debate, beginning around 10:37, positive conversations regarding Obama spiked up and his negatives went noticeably down. This happened initially during the President’s final answer, in which he attacked Gov. Romney on the 47% comments that had become public weeks earlier. Since Romney’s negative sentiment score did not rise at this time (and, indeed, positive conversations about Romney also went up), one can assume that it may not have been the content of the attack that made this difference – indeed words like “victim,” if quoted in the conversations, would tend to have a negative score associated with them. Instead, we might assign the rising positives for Obama to exuberance from his supporters that he had brought up the 47% attack, which he was criticized for having avoided in the first debate, and for both men to the fact that the debate had ended and it was possible to declare a winner, “win” being a word that tends to score positive if unmodified by a negative (ie, “didn’t win this time”).
We can see that final effect in this chart, which looks only at the conversations that unambiguously declared one or the other candidate to have won.
The volume of conversation declaring a winner was small relative to the total number of tweets, so while taken on its own it would suggest a massive win for President Obama, we come back to volume again and see that for one of the few times in the debate, Obama’s volume exceeded Romney’s.
Simple stories are appealing to TV pundits and op-ed columnists who must fill airtime and newspapers almost instantly. But the lesson here is clear: for a marketer trying to grow and sustain a business over time, it is crucial to wade into a complex pool of data before you can be sure that you really know how to understand it.