Industry led standards for web data collection

As a global leader in collecting, structuring, and analysing publicly available web data - Bright Data has a huge stake in the development of Artificial Intelligence (AI) over the coming years. AI has transformative power, but its models are only as effective as the data they are trained on. They rely on large scale, clean, and accurate data to operate. Making sure that the data used to train AI meets high quality standards and the ways in which it is gathered meet high ethical standards is a key regulatory challenge.

This is why Bright Data is supporting the ongoing inquiry by Policy Connect and the All-Party Parliamentary Group on Data Analytics which looks to form a standards system for the ethical use of data and AI.

Ethical questions and risks

While the benefits of data, in particular web data and AI are huge, so are the risks. There are tools available to revolutionise the way companies operate, but they also raise a lot of ethical questions.

Concerns that personal data is being collected without the knowledge of the individuals involved need to be addressed. We also need to think about the quality of data - both in terms of the scope of the public web datasets and the ways in which it has been processed - to ensure that unintended biases are avoided. This can be particularly challenging in cases where algorithms are used to make decisions about things like credit scores, job applications, or insurance rates.

There are also questions about the ‘ownership’ of publicly available web data - particularly as social media and other tech giants seek to retain control to utilise the data that has been made public on their platforms. Information transparency helps drive market competition, advance research, and assist life-saving organisations. The question before us is whether public data belongs in the hands of the public or in the grip of private enterprises. We believe that taking away freedom of information shakes the foundation of free societies.

With public web data, decision makers gain marketplace insights that help them better respond to public demand. For example, retailers analyse data in the public domain to determine the most competitive prices. If public web data is hidden, it will be virtually impossible for brands to hold each other accountable. Ultimately the public will pay the price.

Further, public social media data reveals indications on things like human and sex trafficking across the web, as well as the abuse and exploitation of young people. Accessing and analysing public data can also reveal signs of hate and political deception that can escalate into violence. If this data is taken away from the public and hidden, it will be much easier for wrongdoers to use the web for nefarious purposes.

Public data should serve public good and not be sealed off by big tech companies. 

Need for effective regulation

Regulation and clear standards are therefore vital when it comes to establishing trust. General Data Protection Regulation (GDPR) has been positive in this regard, and the legislative processes around the UK’s Data Reform Bill and the EU’s Data Act are further steps in the right direction. Legislating to give statutory powers to the recently established Digital Markets Unit of the Competition and Markets Authority (CMA) would go further still.

Tying all of this together in the UK, the National Data Strategy (NDS) sets a clear path towards ensuring that the potential of data and AI can be realised in an environment of trust.

However, as important as these measures have been, the pace of technological change as well as the cross-jurisdictional nature of the data industry have made it hard for regulation to keep up. The business of collecting and applying data at the scale from the web remains in many ways unregulated. Poor practice continues to be all too common, damaging the trust that people have in data collection tools and techniques. This has an impact not only on the overall ability to use data in the ways that we would all recognise as positive, but also on the reputation - and ultimately commercial success - of responsible businesses. It also hampers the ability of the non-profit sector to fully realise the socially transformative power of data.

This is why Bright Data is taking steps to self-regulate and, hopefully, set ethical standards that point the way to effective regulation.

Industry-led ethical standards

Bright Data has built a set of stringent compliance measures around the principles of transparency and consent. They ensure that not only the ways we collect public data are ethically sound, but also that the ways in which our customers use the public web data we provide them with stands up to scrutiny. This includes verifying use cases, monitoring and reviewing usage, and prohibiting reselling. Following international regulation, we also implemented automated processes to ensure that these guidelines aimed at collecting public web data are comprehensively followed.

Putting these mechanisms in place requires a lot of time, effort, and resource, but it is essential to operating with trust and transparency.

Regulating for a competitive market 

While legislators are more used to calls to lower regulatory barriers, in our case, effective and well-enforced regulation would do a lot to tame the ‘wild west’ environment we often find ourselves operating in. Businesses should hold themselves to high ethical standards and put them into practice through tight compliance measures. Having high regulatory standards applied across the industry would support a much healthier market as well as building the public trust needed to get the most out of data and AI in the coming years.

Or Lenchner

Or is the CEO of Bright Data, the industry leading web data platform used to access and retrieve crucial public web data in the most efficient, reliable, and effective way. During this time, Or has also established the Bright Initiative, an award-winning program which focuses on using public web data and data expertise for good. Or is also a member of the Forbes Technology Council, the National Data Strategy (NDS) Forum and the Data Skills Taskforce.