Press "Enter" to skip to content

Using data for good

Even if the Internet trend is being mined for facial recognition data, it’s beneficial to further machine learning

2019’s first Internet trend has probably been the #10YearChallenge, which encourages people to post contrasting images of themselves today compared to what they looked like 10 years ago. As with everything, activists who like to visualise an uncomfortable future questioned its motivation. They suggested this challenge was created by online social networking organisations to enable them to gather more data for facial recognition.

This idea is usually championed by the beneficiaries of social media who want to use the services for free with an unreasonable expectation that their data lies idle on a bunch of servers for years with tons of security. Any attempts to use the data for good purposes are met with fierce resistance.

I have reservations regarding the promotion of facial recognition as an evil, which remarkably resembles how GM foods and vaccinations are viewed today. Scientists are battling huge perception problems regarding topics like these, which only highlights the importance of scientific communication to the masses.

In machine learning, pattern recognition is an important problem. Unlike human brains which have billions of neurons and trillions of connections, machine learning models have far fewer of both. To compensate for structural deficiencies, one needs more instances of data to train machine learning models. Upon feeding enough images, the algorithm achieves better classification accuracy wherein it is able to recognise that object with fewer errors. The classification accuracy increases with an increase in the number of data instances.

In addition to the volume of data, diversity of data also matters. With diverse datasets, a machine learning model becomes good at identifying multiple objects as opposed to a single object. In order to ensure generalisability of the model, the datasets are usually balanced where equal instances of each object are provided. For example, if a dataset contains 100 images of apples, it is balanced by adding 100 images of all other fruits so that the model becomes better in classifying a wide range of objects.

Hence in a supervised learning environment, the accuracy of the learning model increases with an increase in data. The ability to detect multiple objects effectively also increases with an increase in a diverse set of data.
Social network organisations are a wonderful platform to gather image-based data. The genius move was to introduce the “tag” option. Tagging a person, object or animal basically provides images with labels and in supervised learning, images without labels are useless. This feature helps scientists use a vast number of images to improve machine learning models.

Platforms like Facebook, which started with image processing, improved algorithms and introduced novel computational methods to accurately process a vast number of images. These learnings were then passed on to other domains to address very important problems. In healthcare, machine learning boosts diagnostic services where advanced deep learning models are able to detect different abnormalities in scanned images. Current machine learning models are capable of accurately processing thousands of medical images in a day, surpassing the human ability to perform the same feat. In public health and social science, satellite images taken during the day and night are fed through a convolutional neural network which enables the prediction of socioeconomic and health indicators like GDP, household income, literacy, nutrition and mortality.

Although not fully developed to do so, machine learning models in the future will develop to partially replace surveys amounting to millions of dollars in savings. In transportation, autonomous vehicles have drastically improved our ability to perform machine learning inferences in real time. There are many more examples of these applications which illustrate its tremendous potential to address current-day challenges. And a lot of this was made possible by your selfie.

One can hold the view that the #10YearChallenge is to advance data requirements for facial recognition. But we already have enough forensic knowledge to predict how a person will look in future even without machine learning. More importantly, we already have state-of-the-art facial recognition algorithms and voluminous image datasets with high racial and age diversity. I am in perfect agreement with Professor Cesar Hidalgo’s views on the #10YearChallenge.

Activists who oppose such challenges in the name of facial recognition must analyse deeper to understand how a novel technology like machine learning can help alleviate modern-day problems. They should exercise caution when promoting the idea that machine learning is evil and should refrain from signature campaigns that ban free Internet.

So the next time you face unlock your Apple iPhone X and rant on social media about the ill-effects of providing the same companies with data, just remember—this facial recognition helps protect your privacy.