A larger version of this article originally appeared in Massachusetts Institute of Technology’s (MITs) Emerging World Blog.
Out of seven billion inhabitants on Earth, approximately four billion1 do not have proper addresses that allow their houses, properties or businesses to be located on a map with reasonable precision. To illustrate, consider the two addresses of one of our authors in two different continents:
In the US: (Name), 7116 Via Correto Dr, Austin, TX 78749: the location for this address is readily available and any navigation system can take you to the doorstep in day or night; or in good or bad weather! This is an example of a structured address.
In India: (Name), College Tilla, PO Agartala College, Agartala 799004: Google Maps resolves this address to an area of roughly 76 sq. km. in the city of Agartala. This address is unstructured.
Adding a landmark to the above address, “College Tilla near College Tilla Lake, ..” narrows the answer to an area of approximately 3 sq. km. Upon reaching the place, and depending on how much detailed information about the occupant or the house or the location is known, one would take a additional 15 to 45 minutes, provided it’s not late night or raining. Moreover, landmark-based addressing is infrequent, incomplete and also inconsistent.
These are not merely one-off experiences that cause inconvenience. The inability to provide an accurate location for each and every address, impacts the livelihood of residents in many ways. It inhibits growth of their local trades such as salons, bakeries or food stalls. It reduces availability of amenities such as creation of bank accounts and delivery of goods and services (e.g., e-commerce) and delays emergency services such as fire brigades and ambulances.
Case Study 1: Logistics and Transportation
The inability to locate an address within a reasonable accuracy, as demonstrated in the previous example, hampers a transporter’s ability to deliver shipments on time. Consider the e-commerce industry, or any industry where goods are delivered to an address. In absence of proper geocodes, most companies in India, “sort” the packages or goods by pincodes (PIN represents “Postal Index Number”), since a pincode is the only numeric location depiction that appears in most written addresses in India.
While sorting the deliveries and facilities (that store shipments for delivery and return) by pincodes sounds like a logically viable solution, it has two major practical problems:
– About 30% to 40%2 of the pincodes in India are written incorrectly, leading to shipments being misrouted and and requiring manual intervention for re-routing to the correct pincode
– The average area a pincode covers is about 179 sq. km. with about 135,000 households and over 100,000 business, educational institutions, government buildings etc. Sorting deliverables for more than a quarter million addresses, where only 30% of those addresses are structured, poses many challenges.
We will pick an area within the Logistics and Transportation industry to illustrate these challenges in detail. Consider an e-commerce company, which is growing at a 30% CAGR3, driven by rising income, consumption and digitization4, and it is expected to continue to do so for the foreseeable future. Indian consumers expect products to be delivered at their doorsteps for free, which causes a unique burden on e-commerce companies not seen in most parts of the world.
When a product from an e-commerce site is ordered online, the merchandise is picked up from a seller or a warehouse and brought to a processing centre, where it is sorted for the destination city — a process known as the “first mile operation”. It is then transported between origin and destination cities in a “line haul” that involves long-distance transportation. In the “last mile operation”, the merchandise goes to the delivery centre from where it is delivered to the shopper’s house. Figure 3 illustrates this process.
In western countries, structured addresses lead to a relatively accurate geocoding and consequently the last mile cost is about 10% to 12% of the total cost5. In India, the same cost is ~30% of the total cost of delivery; notwithstanding India’s low cost of labour. The extra cost comes from the longer time that a driver takes in locating an address – stopping multiple times to either call the recipient or ask someone on the road for directions and additional driving in search of the correct location.
Figure 4. illustrates the challenges of delivering to addresses that cannot be disambiguated at the house level. In absence of a lat-long for the desired address, addresses are sorted based on pincodes and all the packages for one pincode are sorted, stored and delivered from one (or, sometimes, two or more) delivery centre(s). Typical pincode-based sorting centres, located at the orange pin location can have a delivery “throw” (radius) of 4 km to 20 km. In absence of understanding load distribution based on addresses, such centres’ locations often tend to be imbalanced. For example, in this case, while one delivery biker drives about 22 km, the other drives about 104 km, almost five times the distance covered by the first one. Such problems significantly hamper the initiatives to improve productivity and reduce costs at these centres.
If the addresses, on the other hand, can be disambiguated down to a household level, then each individual locality can have a small delivery centre. In such cases, the “throw” of the delivery centre goes down to an average of 1 km to 3 km. In Figure 4, the locations for such small delivery centres are represented by red pins and areas they cover are shown in yellow. This results in a significant improvement in productivity at the smaller centres that can do address-based sorting.
Moreover, more granularity in the address geocodes, also allows us to perform route optimisation and provide system driven routes for the delivery boys. We provide the case of Delhivery, one of India’s leading logistics providers for e-commerce companies.
At Delhivery, a switch from a pincode based to a locality based sorting has improved the productivity of the last mile operation by 40% to 60%, depending on the type/complexity of the addresses and size/shape of the locality.
This high last-mile cost disproportionately affects a company’s bottom-line. In a simplistic analysis in Table 2, we demonstrate that a better geocoding which reduces the last-mile cost by 40%, well within the reach of current technology, can improve the profitability of an e-commerce company.
Even for a small industry in India, such as e-commerce delivery, which is estimated to be a Rs 5,000 crore (~$775 million) business annually (as of 2017), the annual cost savings from a better addressing scheme is about Rs 650 crore (~$100 million).
For the Logistics and Transportation industry, the same framework can be used for different types of goods and services transportations. Even in case of inter-city transport, it is both the first mile (pickup from a client or a distribution centre, for example) or the last mile (delivery to a house or business) deliveries that are impacted significantly by the inability to resolve an address.
Case Study 2: Load and Financial Services
India is a credit-deprived country where 642 million people, a staggering 53%, are excluded from formal financial products such as loans, insurance and other forms of credits and financial services. The impact on the economy is significant. McKinsey estimates that the payoff for digital financial services in India by 2025 can be $700 billion and it can create an additional 21 million jobs6.
The reasons for the paucity of credit are many: lack of verifiable identity (akin to social security number in the US), absence of proof of formal income in a largely cash-driven economy, and complexity of disambiguating one’s location, be it home, or place of business.
Consequently, in the past two years, funded by large venture capital investments in financial technology (“fintech”) companies, over 100 startups have started providing services for connecting borrowers and lenders.
Figure 5 shows a typical process promised by one of such services. The process is reasonably straightforward. Once a user applies online or through the app and selects a product, they are typically asked for 5 to 8 sets of documents which include proof of identity, address, age and income, educational qualifications, employment verification document, and bank statements.
A courier picks these documents from the borrower. These documents (which are mostly paper documents) are then scanned and transcoded to be saved in a database using an Optical Character Recognition (OCR), from which it is compared against the information provided by the borrower on the loan application.
This process works well for about 60% to 70% of borrowers, especially in large cities. From our survey with leading firms, we estimate that about ~70% of the documents are considered a “match” and go to the next step for loan processing, e.g, loan eligibility analysis, approval of loans etc., albeit with only a certain percentage of applicants being eligible for loan.
However, for the ~30% of the applications, the address provided by the applicant in the application does not match the documents. To understand why, consider the following addresses for the same house, in Figure 6.
Replace with: For the same address: a) House number is written as “TH-146B”, “146”, “Unit 146”. b) The community has been described as “Purva Parkridge” and “Purva Park Ridge”, and also abbreviated as “PPR”. c) The road name is spelled in two ways as “Goshala Road” and “Ghosala Road” and omitted in one completely. d) The locality is described as “Garuda Char Palya” and “Garuda Charpalya”.
In other words, just four different sources of official address verification documents can produce over 50 combinations. It is, therefore, not surprising that the addresses provided in documents often do not match. Since the processing of these information happen in a centralised facility, people there would have no idea about “Purva Park Ridge” and “PPR” being the same community.
For the 30% of documents that do not match, the following process kicks in, as depicted in Figure 7.
This results in delayed approval of the loans by up to 5 days in best cases and even weeks or months sometimes. This affects both the borrower, who could be in urgent need of money; and the lender, who has to bear the loss of interest he could have earned until the loan is finally processed and the cost of additional verifications.
Also, the place of dwelling or the business being a key factor in the risk-assessment process of a loan, the inability to disambiguate it shows up in the risk models, raising the rate and, hence, the overall cost of the loan.
We conducted similar analysis for the top three industries: Logistics, Manufacturing (including consumer goods), and Emergency Services to derive a cost-estimate for India. Using this approach, our estimate indicate that poor addresses cost India $10 billion to $14 billion annually or ~0.5% of its GDP; see Table 4.
Further note that the numbers presented in Table 4 capture the cost of bad addresses, but do not include additional benefits of having better addresses like rising productivity and income gains, which lead to further growth of businesses and GDP etc.
Easily discoverable addresses are important for rapidly growing economies like India. Rather than just being a convenience, addresses are vital for driving a self-reinforcing economic cycles and, therefore, improving livelihood and incomes for the next billion Indians. The consumers independently identify and adopt addresses for their own convenience while the businesses use technology or third-party services to resolve these addresses into geocodes to deliver products and services at reduced costs.
Our case-study analyses indicate that the lack of a good addressing system costs India at least $10 billion to $14 billion a year, or about 0.5% of its annual gross domestic product. As the Indian economy continues to grow in both economic output as well as variety of new businesses and services, the costs due to lack of a proper addressing system will increase significantly. India, therefore, needs to consider a dramatically new approach to modernise its addressing system to bring in efficiency.
1. Four billion people lack an address. Machine learning could change that
2. Sample from Delhivery’s database of 10 million addresses
3. Morgan Stanley Bets On Digital To Forecast $6 Trillion Economy, Sensex At 130,000
4. Morgan Stanley Report “India’s Digital Future”
5. Private conversation with multiple stakeholders at Amazon, FedEx and Staples e-commerce
6. How digital finance could boost growth in emerging economies (September 2016)
About the authors:
Santanu Bhattacharya is the Chief Data Scientist at Airtel, one of world’s largest Telco with 450 million+ subscribers. A serial entrepreneur who has led Emerging Market Phones at Facebook a, he is a former physicist from NASA Goddard Space Flight Center and a collaborator at MIT Media Lab
Sai Sri Sathya is a researcher collaborating with REDX and Camera Culture Group at MIT Media Lab and formerly at the Connectivity Lab at Facebook, focused on Emerging World Innovations. He is the founder of a stealth mode start-up working on democratising AI on the edge
Kabir Rustogi leads the Data Science team at Delhivery, India’s largest e-commerce logistics company. A published author, he was previously a Senior Lecturer of Operations Research at The University of Greenwich, UK.
Prof. Ramesh Raskar is Associate Professor at MIT Media Lab and leads the Emerging Worlds Initiative at MIT which aims to use global digital platforms to solve major social problems.