User login

Big Data in elections

Big Data in elections

Does Big Data equal Big Brother?

Policy briefing on the use and abuse of data analytics in elections

Why you should read this briefing:

This briefing addresses two important questions currently framing the debate around the use of data analytics in election campaigns:

·         How public opinion and thus election outcomes can be manipulated. The Briefing gives a brief overview of the most common (mal) practices.

·         Why this manipulation of the electorate is such a big deal


In tandem with a prevailing mood of election fatigue, reports of foreign interference in the Brexit referendum and the US elections, have, understandably, raised concerns over the resilience of our democratic electoral processes. Reports of large scale Russian hacking of elections ultimately proved false. However, the uncomfortable truth underlying the sometimes hysterical debate about hacking is that it is possible to use cyber space to influence elections.

This briefing aims to inform the reader on the subject of Big Data and its use in elections by addressing two fundamental questions: How can elections be influenced through the analysis of large data sets and why are we upset about it? The first question discusses ways and means while the second one is clearly nominative and deliberately provocative.

How elections can be influenced

Recent technological advances and ever more personal data stored online mean that the ways in which the behaviour of individual voters or voter groups can be analysed and predicted have multiplied. A technique known as ‘botswarming’ relies on bots creating numerous online accounts, which can then be directed to support specific views on politicians, policies and parties. This technique is used to create the (false) impression that certain policies are vastly more popular than they are in actuality.

A recent nine country story conducted by the Oxford Internet Institute found that ‘the illusion of online support for a candidate can spur actual support through a bandwagon effect’,[1] thus lending credence to the popular Americanism of ‘fake it till you make it’.  ‘Astroturf campaigning’ describes a process by which paid employees are disguised as grass roots campaigners. The method is aimed at giving lobby organisations more legitimacy. This is particularly effective when coupled with better polling data and improved voter profiling. 

However, there are valid reasons to believe that ‘botswarming’ and ‘astroturf campaigning’ are ultimately self-defeating. Both aim to alter our perceptions and views on certain issues, without the subject of the manipulation being aware of it. Yet, once these methods are employed successfully, they are used more widely, thereby raising public awareness. Like corruption, manipulation thrives in the dark. Once undecided voters, the primary targets of these ploys, become aware of them, their effectiveness is likely to be greatly diminished. This is the sociological reason for the limited applications and ultimately impact of such methods. The technological reason is that similar technology to that making bot networks possible, can also be their undoing. Interference through ‘botswarming’ leaves distinct traces and patterns. It is, for instance, highly suspicious if a tweet by a party or politician is retweeted instantaneously several thousand times. Data analytics can detect and thus eventually prevent such patterns.


One technique is by far the most dangerous to democratic processes precisely because it seems so benign: psychometrics. The method harvests personal data available in the public and semi-public domain. Our activity on social media platforms such as Twitter, Facebook and Instagram creates a sheer endless number of data points, indicating our opinion on almost any subject known to man. Crucially, the conclusions that can be drawn from any one opinion, ‘like’ or comment are rather weak and superficial. Most of us share their views on these trivial subjects carelessly, because we assume it reveals very little about ourselves. Taken in isolation, this is absolutely true.

However, as early as 2012, a psychometrics developer, Michal Kosinski, could demonstrate that an average of only 68 ‘likes’ on Facebook is sufficient to predict a user’s skin colour with 95% accuracy, their sexual orientation with 88% accuracy and their binary political affiliation with 85% accuracy.[2] The accuracy of these predictions increases steeply with each additional data point or ‘like’. More data points also make it possible not only to make the above predictions more accurate, it also permits making entirely new ones, such as religious affiliation and whether or not you are married. The underlying model for this type of behavioural profiling is called the ‘Ocean’ or ‘Big Five’ model, which assesses five behavioural components.[3]

For the purpose of this briefing, the reverse relationship between psychological profiles and data is the more important one. While the fact that your activity on online platforms permits the creation of a psychological profile should matter greatly to the individual, the using of Big Data to specifically filter certain features is of systemic importance. This ability is crucial as it allows company with access to these data sets and the corresponding psychometric algorithms to target individuals based on their psychological profiles, thus allowing searches for ‘practicing Christians’ or, crucially ‘undecided voters’.

This new ability to create voter profiles based on multiple data points will transform the way elections are fought. Previously, electoral messages were based on one key demographic data point: all voters belonging to an ethnic minority, or all women would receive the same message. In a sense, this is a rather crude approach to engaging with voters. The merger of psychological profiling and Big Data enables a much more sophisticated message, tailored to the preferences and convictions of individual voters. 

So, to answer the question posed above, how elections can be influenced: While there are several avenues for influencing the electorate, psychometrics is the most efficient method, not least because it is currently legal (with certain caveats) and does not suffer from diminishing returns as some of the other methods. Its methodology is based on existing insights of behavioural psychology, relying on the so called ‘Big Five’ psychological assessment test. Big data permits the large scale gathering of social media data in the public domain or through purchasing data sets from data brokers. Algorithms can then process the available data using a theoretically infinite number of data points to create ever more sophisticated profiles.

The output produced through this approach is a highly specific, targeted message to individual voters or potential voters. Psychometrics would allow the operator of the algorithm to target voters in marginal seats which are undecided as to their voting intent, employed, in their 30s, concerned about the NHS, and married. This profile allows political campaigners to devise and tailor messages precisely, amplifying their resonance with the intended audience manifold. These crafted messages are then delivered through personalised ads, or through boosting grass roots campaigning by highlighting households receptive to the campaign message.

This boost to the efficiency of grass roots campaigning became apparent during the Brexit referendum, when Vote Leave managed to use its grass roots campaigners with enormous efficiency and specificity.[4] The results also became visible during Donald Trump’s electoral campaign in 2016, when so-called ‘dark posts’ appeared on Facebook, as sponsored newsfeed content, only visible to African Americans for instance, or only to people in small, geographically defined areas.[5] The Trump campaign also highlighted how under-regulation of the use of personal data can have serious can negative consequences- even more than the Trump presidency itself.

Why we should be upset about digital manipulation

The second question raised at the beginning of this briefing, ‘why are we upset about it?’ is more controversial. This question is less heretical than it might appear at first glance. After all, we live in a world where Google, Amazon and co. are keeping precise track of our shopping patterns, items, and payment discipline. We find it convenient when companies pool financial information about us in one location and then suggest financial products to us based on the information they have gathered. Many of us also accept that mobile apps harvest our data if they get an app which makes their lives easier in return for their data. Why then, would we object to political campaign ads targeting us which take our views and convictions into account?

After all, if you are childless and planning to remain so, the quality of schools wouldn’t factor into your choice of residence and information about it would be useless at best, a nuisance at worst. It can also be argued that political campaigning already targets voters according to whether they live in a marginal seat, are on the electoral register and whether they have expressed a party political preference in previous elections. The use of psychometrics in political campaigns can hence be framed as the refinement of previous, accepted practice. Even if one takes issue with the influence wealthy individuals or groups can exercise over electoral outcomes, one might argue that this too, goes back as far as democratic politics itself, however undesirable we find it.

To assess the damage caused by such practices and to estimate the likely future impact, two principal features need closer examination: the power structure of citizen vs the state and social fragmentation. The claim that digital interference in elections will lead to social fragmentation might seem counter-intuitive to some, particularly as Samuel Wooley, Director of Research at the Oxford Internet Institute, labels the various ways in which elections were manipulated according to their nine country study as ‘manufacturing consensus’.[6]

This might be true in the short term, when political views are promoted through giving the illusion of broad based consensus in support of them. In the medium to long term, enhanced and refined data analytics capabilities mean that political campaigns parrot voters views back at them to gain their support. This in turn leads to the much referenced ‘echo-chambers’ for opinions, whereby people tend to engage only with people online that share their views. The parallel emergence of strongly partisan politics in the USA and the use of data analytics in political campaigns suggest an almost symbiotic relationship between the two.

The second concern about the use of data analytics in elections concerns the shifting balance of power between the state and its citizens. Historically, the ability of a state to control its citizens has exponentially grown. Even allegedly absolute monarchs such as Henry VIII had very limited actual and direct control over their subjects unless they dwelled in close physical proximity. Through data analytics, states obtain the ability, in theory, to track and influence each citizen’s political activity in minute detail.

It is worth pointing out that so far, we have only considered comparatively benign Western democracies, where one political party seeks to gain the upper hand over a rival party in open and free elections. This is in itself deeply problematic. Even the best intentions, implemented through digital nudging (friendly pointers towards reducing one’s carbon footprint for instance) by a paternalistic state can have harmful unintended consequences. Scientific American cites two cases where well-meaning paternalistic nudging led to such unintended consequences, one relating to health bracelets, the other to swine flu vaccination recommendations.[7]

Now imagine for a moment a nefarious government, such as the Nazi regime or the Soviet Union under Stalin with this technological ability to influence its population. Even in democracies with appropriate checks and balances, this type of electoral influence is bound to distort outcomes and thus undermine trust in democratic institutions and processes in the long run.   

One thing the use of data analytics in the 2016 US elections and the British Brexit referendum have demonstrated is that light touch data protection regimes are much more susceptible to this type of undesired influence than countries with more stringent regulation. Data derives from two principal sources: social media platforms or purchase from large data holders. Consequently, regulatory regimes need to be two-pronged as well. Reliance on social media platforms efforts to regulate itself are unpromising, and the track record of industries regulating themselves is not encouraging either, as the financial crisis highlighted.

Comprehensive legislation should force social media platforms to fundamentally rethink their policies on data sharing. With regard to companies passing on large data sets, this should be permissible only with the users explicit consent, and not be allowed by default unless the user actively opts out, which is the current modus operandi. This would not eliminate the problem, but through increasing the difficulty in obtaining data and increasing the cost of doing so it might at least impede such illicit practices. A broader legislative rethink should be anchored around the idea of regarding personal data as a commodity whose ultimate owner is the user that produced it.

[1] Oxford Internet Institute, Nine Country studies:

[2] ‘The data that turned the world upside down’, Motherboard, 28th January 2017

[3] For a two minute introduction to the ‘Big Five’ or ‘Ocean’ model:

[4] Tim Shipman, ‘All Out War. The Story of how Brexit sank Britain’s political class’, p.407

[5] ‘Facebook and Twitter are being used to manipulate public opinion-report’ by Alex Hern, Guardian, Monday 19th June 2017


[6] Samuel Woolley, Director of Research at the Oxford Internet Institute

[7] Scientific American: