Unanswered Questions of Big Data Christian Colen

Unanswered Questions of Big Data

share this

Integrating big data into public diplomacy is easier said than done. The ability to draw meaningful conclusions relies on being able to trust the data collected and the process by which it was created. If we dive a little deeper into the process of collecting big data, we find that there are numerous difficulties with assessing the reliability of big data. Here are some of the most pressing questions about using big data.


How do you ensure that the data collected represents a target audience?


Short answer? You can’t. Twitter, unfortunately, is not a science. Neither is Facebook, or any other social media. There’s no scientific requirement that the Twittersphere represent a perfect microcosm of the general population that will regularly tweet out their opinions on current events. In reality, internet participation and penetration rates are much higher in wealthier countries. We can’t assume that people on social media are necessarily representative.

Additionally, the substance of big data is comprised of tweets, posts, and other content. Personality traits, internet access, language barriers, and other factors may contribute to why a person chooses to participate in an online dialogue, and the confluence of those factors creates a selection bias. The only data available comes from people who have access to a social networking account, and feel strongly enough about an issue to tweet or post about it. That’s problematic when it comes to gathering data

But even if there were a way to ensure that every person in the world created social media accounts and regularly tweeted out their opinions, social networking sites often fall prey to confirmation biases. Put simply, social media users more often choose to follow and like pages that they agree with. For government agencies hoping to engage with a particular target audience, this is problematic. Any data gathered on Facebook pages is likely to skew positively, since it’s less likely to be seen by someone who is inclined to disagree with it. Even when using a paid ad, Facebook and Twitter use algorithms that purposefully show users content they’re likely to agree with. On Twitter if you see a promoted tweet that you dislike, you can request not to see any more of those kind of tweets. In most cases (marketing, primarily), that would be a positive, but when it comes to seeking the input of people who disagree with you, Facebook and Twitter algorithms are an additional hurdle.


Is there a way to measure the demographics of the people interacting with content?


As of right now, it’s pretty difficult. And without the context of the raw data, conclusions are substantially more difficult to draw.

On Twitter, for example, if you wanted to know the ages, genders, and country of origin of your followers, it would be nearly impossible to figure out. Given that a recent study found that approximately 15% of Twitter accounts were bots, information about the location of likes and tweets would be especially salient. Unlike Facebook profiles, which most often feature full names, previous work places, approximate location, and education history, Twitter users’ profiles usually reveal scant amounts of information. Even on Facebook, the process of going through and coding every user who liked a post for demographic information would be tedious and time consuming. As of yet, there is no computer program capable of doing this.


What conclusions can be drawn from big data measurements? And what conclusions can’t?


The answer to life the universe and everything unfortunately cannot be found hiding somewhere in big data.

The divide between the digital world and the real world necessarily limits the conclusions we can draw from big data information. Digital action doesn’t necessarily correspond to real world change. For example, a strong condemnation of a political event on Facebook that gets 500 likes may only result in one or two phone calls to elected officials. While it’s occasionally possible to retroactively draw a link between online activity and real world action, there is no way to guarantee that a tweet today will change the world tomorrow. Public diplomats should be careful to acknowledge the limitations of what digital data can tell them.

Big data can often reveal obvious trends, like “President Trump’s most recent tweet had 30,000 retweets,” or “The State Department Post about Jerusalem had more angry reactions than other posts.” But in order for those raw numbers to have meaning, they need context. Did people retweet because they agreed or because they disagreed with the tweet? If there are a lot of angry reactions, how can we clarify what exactly people are angry about? Measures like retweets and likes are disappointingly unspecific, and as of yet, there’s no foolproof way to fully interpret what they mean.

One area where big data can be valuable is in identifying overall trends. Again, the source of the information should be taken into account, but understanding differences and similarities across countries and regions can help public diplomats to craft and promote effective policies in different regions of the world. The value of big data lies not in the details and the minutia, but in its ability to identify patterns in extremely large sets of data.