
Artificial intelligence can benefit us in many ways. But does it have to be at the cost of privacy? Is there a middle ground?
Beginning from the Hellenistic period up until the first world war, land was the most valuable asset in Europe. Powers that we would fight each other to capture more and more land. More land meant more power.
In the Industrial Age, after the first World War, machinery became a source of power. By the Second World War, the machinery and the oil that drives it had become the most valuable asset. They say that Data is the new Oil. Since data drives the new industries in the information age, it has become the new asset that’s being subject to the land grab.
The new economy is driven by providing value in the digital space. The five most valuable companies in the world today are tech companies. Companies like Facebook and Google have lots of data on you, whether you use their services or not.
They need this data to serve their personalized advertisements. And these companies control a majority of the internet today. Naturally, the questions arise around the misuse of data, both in terms of privacy and in terms of manipulation.
A recent example is the dystopian display of manipulation pulled off by Cambridge Analytica where it used Facebook data on individual users to manipulate two important votes — American Presidency, and the Brexit Vote — by pushing propaganda at a micro-level. This misuse of data is what we need to worry about at the moment.
But for a moment, let’s forgive Facebook’s careless handling of their user’s data, let’s forget the immediate danger posed by political manipulation, and peek into the future. The next big problem is corporate manipulation.
But for a moment, let’s forgive Facebook’s careless handling of their user’s data, let’s forget the immediate danger posed by political manipulation, and peek into the future.
Democratisation of AI has been promised by many organisations. Google and Microsoft are at the forefront of democratising AI, but they mean different things. Both the companies democratise AI by inducing AI into their services to make them better and more personalised for each user. Google takes it a step further by publishing most of its AI research online; they also introduced Tensorflow (a Deep Learning Library). Keep in mind that Tensorflow was largely made efficient by the open source community.
However, this is not true or complete democratisation. They publish their AI research, but the data to power that research to make something useful out of it is Google’s property. So, the training data is not democratised. It’s accessible only to the company. Same goes for computing power. Only these large organisations have the compute power to experiment with AI, to make it strong and useful for people. And most of the AI talent acquired by these companies are focused on solving tasks specific to the company’s interests.
Also see: Interactive: 100+ Indian AI Researchers, Ranked by H-Index, Citations
The openness policies on AI research that gives these companies good brand value — both in the media as well as the research community — does not actually pose a risk to these companies because AI research today suffers from reproducibility crisis. A majority of AI research is not reproducible.
If an independent researcher were to follow the guidelines and the methods described in the research, they will not be able to get the same results as mentioned in the paper. And the results will be generally poorer, even if this independent research is a collaborative effort, as is usually seen on Github.
Smart, independent people coming together to reproduce a research paper on open source platform usually fail to get impressive results. There are multiple factors behind this, but this is why it’s okay for companies let their research work out in the open because it is hard for anyone outside these powerful labs to transform research into efficiently working products.
So the idea of democratising AI as put forth by these giants is not true democratisation. It doesn’t give any individual in any part of the world the ability to use AI and design solutions. And this also ensures that most of the AI being actively developed and worked upon is used in the products and services of these giants, so that they can keep their users engaged. AI is too powerful a technology to be used only to make digital services more efficient.
Also see: A top Googler talks about the ethics of AI and job losses
Coming back to the question of user privacy. How can we ensure complete privacy of our data? The current internet is not designed to keep data private. The efforts in the direction of Web 3.0 vouch for true ownership of personal data. In such a paradigm, the data is not stored on the cloud, but rather in the owner’s device. No central authority, or a corporate, or a Cambridge Analytica can pull up this data together and do whatever they intend to. The smartphones and tablets that we use are getting powerful every day, and so the ideas of Web 3.0 revolve around decentralisation. All the applications and their computations will happen on the devices, with no central authority pushing or pulling the data. The user-generated data doesn’t leave the device. This ensures complete privacy of the user’s data.
If we keep the data private, how will the research organizations around the world access this data?
The solution lies in decentralising the AI computation. The AI solutions today are built by first collecting the data in a silo owned by a central authority. Then these data points are fed into an AI model in batches. The model learns and becomes more and more intelligent as more data flows through it. This process is overseen by a data scientist. These computations are quite expensive to perform using cloud services provided by Google Cloud or Amazon AWS, and buying the hardware to perform these computations is even more expensive. And even if the computation is made affordable, the privacy is leaked by default because the data storage is centralised. How do you decentralise AI?
Google introduced the idea of Federated Learning where instead of storing all the data into the central server and performing the computations there, we send the machine learning model to the individual user’s devices where the locally stored data is used to train this model using the compute of the device itself. Then, these updates to the model are sent back to a central server where the updates from multiple devices are combined to update the intelligence of the global model. This newly gained intelligence is then broadcast to more users who train it on their local device with their local data. The cycle keeps going, until the AI model has reached the desired criteria for intelligence.
Each individual contributes their computation capacity and their data to a central intelligence. There are many benefits to this.