The backstory of Alexa’s Indian makeover: desi, agnostic, politically independent and… work in progress

Sunny Sen November 22, 2017 11 min

She is modern, speaks fluent English, helps you book a cab, finds recipes for cooking, plays your favourite music, and gets charmed by Shah Rukh Khan, her favourite actor. She is the quintessential Indian lady, who probably will have the answer to all your questions. Her favourite actress is Emily Stone and Rachel Weisz, and her favourite ice cream flavour is mint choco-chip. She is Alexa, born into a family of American origin, Amazon, but speaks English with an Indian accent.

She was trained to do so. Alexa is Amazon’s voice assistant, and India was the fourth country where it was launched. The ecommerce company has ambitions to put Alexa into everything. Into your phones. Into your cars. Throughout your home. And even into hotels and offices.

It has made some early successes in the US, UK, and Germany. There are dozens of companies all around the world building Alexa into their products. At the CES, Ford, Volkswagen and Audi showcased cars with Alexa embedded in their dashboards.

The Amazon Echo Plus and the Echo Dot. At FactorDaily, we bought the mid-range Echo.

When connected to the internet, she tries to answer your questions and follow your commands. But India will be Alexa’s ultimate test. “Voice assistants work well in countries with single or dual languages. India is different,” says Mohan Ram, managing director of Lattice Bridge Infotech, who has been working in the field of speech recognition technology since 2001.

When Ram started in 2001, he told his investors that his company will solve the language and dialect problem in Karnataka in five years. But 17 years since, he admits that he has been able to solve only 80% of it. “Every 100 km there is language change and in every 30 km the dialect changes,” he says.

But, he agrees that artificial intelligence, machine learning, and deep learning will solve a lot of these problems. Amazon is betting on that to solve the Indian complexities.

Desi-vibe in videshi package

More than a year before Amazon launched Alexa in India, it had started its training to cater to the local needs. “Alexa understands the colloquial terms and its context. In India, unlike other single language countries, we will end up using words that will have proper nouns – it could be the name of a person, place, a Bollywood album or a lyricist or a movie,” says Puneesh Kumar, country manager, Alexa experience and devices.

Kumar has been with Amazon since May 2010. He started as an intern and then was as a senior project manager in Amazon China, where he worked during the launch of the marketplace. His longest stint was as the general manager of Amazon Global Selling programme: for two-and-half years, before heading the Alexa practice in India.

“We had to think outside the box of just understanding English. We had to train Alexa to understand the proper noun in Tamil, Hindi, Telugu, Punjabi, Malayalam, among others,” says Kumar, now based in Bengaluru.

These problems are unique to India as even states have multiple dialects. For example, in Belgaum, a city situated on the Karnataka-Maharashtra border draw their language from a mix of Konkani, Marathi, and Kannada. In Udupi, also in Karnataka, 386 km from Belgaum, people speak a mix of Tulu, Malayalam, and Kannada.

“Given the vast population of India, it may make sense for rollouts by state,” says Ray Wang, principal analyst and chairman of Silicon Valley-based research firm Constellation Research. “It’s still behind Google in capability but catching up fast. What it requires to succeed are a lot of users to test and learn.”

Puneesh Kumar, country manager, Alexa experience and devices,Amazon

Alexa sits in the cloud and it’s constantly learning, as it is built over a framework of artificial intelligence and machine learning. Amazon combined two pioneering technologies of cloud computing and AI and wrapped it with the simplicity and ease of voice as the user interface.

“It is the machine learning in the background where every utterance is helping it learn,” says Kumar. Alexa is based on natural language understanding (NLU), which essentially means that it understands sentences and contexts, and converts it from text to speech. The context varies from country to country. In the US and UK, when people are talking about marks, they are talking about scratches. But in India, marks usually refers to grades and scores.

It also understands that India follows a numbering system in lakhs and crores, not millions and billions. Alexa had to made aware of that, says Kumar. It has picked up abbreviations such as UP (for Uttar Pradesh), MP (for Madhya Pradesh), and CM (for chief minister). It can also identify different pin codes. It has also picked up Hindi words like haldi, jeera, and dhania, which are not English words but are common Indian words.

Training it for India

To be sure, customer experience on Echo devices (Echo instantly connects to Alexa to play music, get information like news and weather, and controls your smart home using voice) is still a work in progress. A “Who is Anand Murali” query to the Echo at the FactorDaily Bengaluru office turned up an incorrect answer while, ideally, it should have had the context to the query and identify our resident geek. A Google voice query throws up Anand’s LinkedIn profile on top of the search results.

It is far from perfect, Shonali Muthalaly writes in The Hindu. “Alexa is still figuring out India, so when I ask for restaurant recommendations, traffic predictions and routes, she answers with a volley of sorrys.” Presumably, as more Indians get on board the platform, results will get better.

Training Alexa wasn’t easy, admits Kumar. Amazon started with a finite set of words – perhaps about 10,000. Kumar doesn’t remember the exact number. This is called the training data. Then there is something called the test data, which is infinite and is a mix of how humans interact and the world wide web.

Kumar thinks that Alexa isn’t yet perfect so it is available only on invitation to a few people. As more people talk into the Echo devices, Alexa will learn more. “That’s when machine learning kicks in and starts identifying new places that were not there in the training set… as more people talk to the devices, the utterances expand. Things are not as they should be, it will get better with time,” he says.

There is something called DWC (demand weighted coverage). A list of the most popular and frequently uttered words are made. While recognising these words Alexa looks for patterns, sounds, phonemes, context and then puts them together and see what is the word you may have said. If it does not match, each mismatch is tracked to improve the experience. Even the mismatch form a data set, and the machine learning processes it.

Early on, Alexa knew that Amitabh Bachchan is an actor and could even fetch his songs, but pronounced Bachchan as Bakkan (without the “chch” sound). It learned over time. Kumar says that a large part of the experience is for Alexa to read out the word back in the right dialect and pronunciations – not just for English but also the popular words in Hindi and Telugu.

 

Kumar says that the holy grail is Alexa’s understanding the person on the other side. “No matter what you say, we wanted Alexa to understand the intent behind it. We look at major utterances by the intent,” he says. For example, play song, play song from a movie, play song from a lyricist, or may not use the word play, sing something for me, help me lighten up my mood – might all end up in requiring Alexa to fetch the same result. “We try and draw correlations,” says Kumar.

If the user says “no” or changes the query in a few seconds after the results are fetched, the machine learning algorithm understands that Alexa did not fulfill the intent. Even this input is used for training.

Amazon uses a mix of machine learning and manual intervention in training Alexa, especially where the same word has multiple pronunciations. “We get high confidence responses back, and then we do put an audit mechanism with someone who is very familiar with that word to make sure that we get those phonemes right,” says Kumar.

But it is not possible to do it with every word. It picks up the top 20 or 30 words and does the process, and the process keeps on going every time there is a word, which has higher level of utterances.

The lady, however Kumar says, will have one voice. It won’t change in from south to north India, even though Amazon hopes that Alexa will understand the different cultures, accents and mindsets of people who are speaking into the mic. “We are looking at comprehensive end voice. The end voice is a modulation of machine, phonemes, lexicons and all these are pre-recorded with a very Indian voice,” says Kumar.

Since it is not possible to record every word, a set of words were recorded to create what can be called the basic structure. “We take a human voice and then couple it with all the machine learning, phonetics and lexicons. We look at it as a combination,” he says.

Also, it is not possible for one lady to know all accents and languages, so it chose multiple people to record for India. However, with the help of machine learning, Amazon has managed to make Alexa sound the same, where ever you use it, in India. “The Alexa that is talking in India, will have one personality,” says Kumar.

The Star Trek inspiration

On Day 1, even before Amazon has done a full commercial roll-out of its voice assistant Alexa in India, it comes with 10,800 skills to be precise. When Alexa was launched in the US, it had started with only 13 skills. Skills are voice-based applications, like mobile apps, which allows users to operate an app using voice commands.

Steve Rabuchin, head of Alexa voice services and skills at Amazon had told Wired that the company was inspired by the Star Trek computer – to create Alexa in a way that a user could control everything around him through simple voice commands.

Amazon has already stitched partnerships with developers in India to integrate popular apps with the voice assistant. For travel, there is Ola, Goibibo, ixigo and Jet Airways. For food, there is Faasos, Zomato, Freshmenu, Sanjeev Kapoor, Tarla Dalal. For music, there is Saavn and Bollywood Hungama. For sports, it has ESPNCricinfo. For news and education, there is Times of India, NDTV, ABP Live, AajTak and Byju’s. For smarthome solutions, they have Syska and Silvan, and UrbanClap and Housejoy for handyman services.

Kumar says that it is very easy to integrate Alexa. “We have had kids as young as 10 years old build a skill and people who are old without any tech knowledge to build a skill,” he says.

For Aloke Bajpai, CEO and co-founder of ixigo, Alexa is a good distribution platform. “Amazon is very aggressive and we expect that they would reach to up to a couple of million devices in a very short span of time,” he says.

But Bajpai is not limiting ixigo to Alexa – he is building his own voice assistant called Tara. “Alexa’s use is limited… you ask something and it responds. It needs to be proactive, it should be able to recommend things to you if a long weekend is coming up,” says Bajpai, but he agrees that with Google and Amazon putting its weight behind voice assistants, things are changing fast. “The voice synthesis has improved… At the base we are using their APIs,” he says.

For Aloke Bajpai, CEO and co-founder of ixigo, Alexa is a good distribution platform. “Amazon is very aggressive and we expect that they would reach to up to a couple of million devices in a very short span of time,” he says.

While learning furiously, there are human interventions in moulding the personality, too. “We personify Alexa by giving her characteristic attributes and personal preferences,” Kumar said on a follow-up email through a spokesperson. “She is also agnostic to religion, politically independent and a strong supporter of science, technology, innovation, diversity and social progressiveness.”

Can Alexa be a girl friend or a loyal companion? “We want Alexa to be the voice service, be a source of engagement and someone you can talk to in whichever situations you are going through,” says Kumar. “If you say ‘I am stressed out’, she will ask if you want to listen to some meditation music.”

For now, it doesn’t do much for FactorDaily’s mildly miffed geek, though.

 

This story was updated at 5.45 am on November 23, 2017 for typos and a link.

Disclosure: FactorDaily is owned by SourceCode Media, which counts Accel Partners, Blume Ventures and Vijay Shekhar Sharma among its investors. Accel Partners is an early investor in Flipkart. Vijay Shekhar Sharma is the founder of Paytm. None of FactorDaily’s investors have any influence on its reporting about India’s technology and startup ecosystem.