90% of the world’s data was created in the last two years.
What does that mean for the un-banked in emerging markets?
Across the emerging world, innovative financial institutions are realizing the potential of alternative data to transform lending. Data from online social networks, mobile phone records, and psychometrics are helping to illuminate the potential of borrowers where traditional credit information is scarce, enabling new lending and greater control over risk. But as these alternative sources of data find traction, it is important to recognize that not all data sources are created equal. Rather, they possess important strengths and weaknesses with major implications for financial institutions and the clients they serve.
Why isn’t Traditional Credit Scoring Enough for Emerging Markets?
In the developed world, large credit scoring firms like Equifax, Experian and TransUnion provide lenders with credit scores based primarily on loan applicants’ past repayment data. While these traditional scores are highly predictive, they rely heavily on robust, centralized credit bureaus that are able to gather, store, and share accurate repayment history. This reliance on credit infrastructure presents an imposing challenge in emerging markets, where bureau data is often inaccurate or incomplete if not altogether unavailable. According to the World Bank, less than 1 In 10 people in low and middle income countries around the world are on file in public credit registries[i].
What is Alternative Data?
Alternative data can mean anything and everything beyond the re-payment data gathered by banks and credit bureaus. However, three data sources have garnered particular attention from emerging market lenders recently: online, mobile and psychometrics. At EFL, we’ve worked extensively with all three data sources, using large databases of loan performance and industry leading modelling techniques to understand their relationship to credit risk.[ii] So what do these data sources include? How can they be used by lenders? And who is currently using them?
What makes an alternative data source more or less valuable to lenders?
There are two metrics by which an alternative credit scoring data source should be considered: availability and predictive power. We’ve evaluated online, mobile, and psychometric data by both of these metrics, enabling the first comparable, quantitative and objective analysis of all three data sources[iii].
Availability: for how many people is this source of data able to be captured, and at what price?
The percentage of people using the internet around the world has more than tripled in the last ten years, and as access grows so do individuals’ digital footprints, capable of providing previously inaccessible risk insights. Furthermore, because online data is publicly available or obtainable through simple user authentication and permission, it is inexpensive to collect.
However, the fact remains that 60% of the world remains offline, and that 60% is heavily concentrated in developing economies. In South Asia, for example, less than 1 in 7 people are online, and even fewer are on social networks and e-commerce sites. Furthermore, digital footprints are richer among the young, educated, and tech savvy, meaning in many markets online data will only apply to a small and skewed portion of the population.
Online Data is growing quickly and is inexpensive to gather, but still scarce in emerging markets and skewed towards the young and educated. In our analysis of borrowers from eight Sub-Saharan African borrowers, 10% had verifiable online social data.
In the past decade, mobile phones have become nearly ubiquitous around the world. More than 90% of people have a mobile phone, and there are more cellular subscriptions in developing countries than in developed ones. As mobile phones become the essential mode of communication in emerging markets, the data that can be collected and analyzed from them becomes richer and more descriptive.
Unlike online data, however, mobile data requires significant up-front investment. Both Call Detail Records (CDR) and Transaction Detail Records (TDR) are owned by Mobile Network Operators (MNOs) which are rightfully protective of their users’ data and privacy. Furthermore, some MNOs are becoming lenders themselves, making them less willing to share user data with lenders that may be competing for the same clients. Finally, in many countries, mobile users hold pre-paid subscriptions to multiple MNOs, making it necessary to amalgamate multiple data sources to build a comprehensive picture of an individual’s mobile behavior. To gather complete mobile data for just 80% of Indonesia’s population, for example, one would need to obtain agreement from 5 separate MNOs.
Mobile Data is widely available. In our analysis, 72% of borrowers in an emerging Caribbean market had available mobile data, but privacy laws and fragmented markets imply large up-front costs for data collection and utilization.
Unlike online and mobile data which already exists, psychometric data is actively captured at the time of application. Psychometric scoring does not rely on retrospective information and therefore is not limited to small sub-sets of the population or dependent on third party information providers. Rather, psychometric data is collected through questions in a survey, and therefore can be made available for anyone, anywhere.
However, active data collection also means higher data collection costs. Lenders using psychometric data for loan decision-making often choose to administer psychometric credit applications in person, rather than remotely online, which requires time and energy on the part of both loan officers and loan applicants.
Psychometric Data is universally available and can be implemented easily, but it is actively captured and thus incurs higher marginal costs than the other data sources.
Power: how meaningful and practical are the sources in measuring repayment risk?
The predictive power from online data depends on the size and maturity of an individual’s digital footprint. More extensive data sets provide more features for modelling and enable a more complete snapshot of one’s online behavior. In our modelling efforts we found that simple things like the frequent use of slang and contractions in Facebook posts can relate strongly with default risk.
If implemented carelessly, however, online data can be misleading, as it is relatively easy to “game” over short periods of time. Users who know the attributes that lenders are evaluating can adjust their online behavior, for example using less slang leading up to their loan application. For this reason, it is all the more important to work with large, mature digital footprints, preferably across multiple platforms.
In our analysis of borrowers across Sub-Saharan Africa, online data provided a relatively small, 14% gini boost to predictive power. However, with deepening digital footprints this power is likely to grow over time.
CDR data sets provide intricate detail on a range of attributes including who you communicate with, how often and for how long, as well as account payment history. In our analysis we found that with simple features like average days between calls, continuity of account service, balance inquiry frequency, and call durations we were able to create a relatively powerful model.
Mobile phone data also has some practical advantages over online data, namely that it is easier to match to individuals because telephone numbers are unique. Like online data, lenders must be careful to limit their analysis to large, mature data sets in order to mitigate the risk of user manipulation.
Mobile Data offers rich behavioral information when captured in sufficiently large and mature data sets. In our analysis, mobile data achieved a gini of 26%.
Psychometrics offer a broad variety of features for modeling, enabling a holistic view of an individual’s character and willingness to pay. The ability of psychometrics to measure risk, however, is highly dependent on the quality of the questions asked. Factors like language, culture, age, and industry can influence one’s survey responses, so care must be taken in crafting questions that are impartial and universally applicable. Furthermore, particular attention must be paid to tracking and preventing user manipulation, as psychometric data is self-reported, rather than observed.
When implemented carefully, psychometrics offer robust predictive power. This is particularly true when the application is administered electronically, rather than on pen and paper, because it allows one to observe not just what an individual answered, but how they interacted with the application, i.e. how long they spent on each question, if they changed responses, and so on. This meta-data provides additional features that are very valuable for modeling directly, as well as for detecting gaming and fraud on behalf of either applicants or loan officers.
Psychometric Data offers strong predictive power when implemented with attention to question quality and user manipulation. EFL’s assessment obtains ginis of 24-33% depending on the country.
Taking Next Steps with Alternative Data
Alternative data has the potential to fundamentally change lending in emerging markets. Financial institutions looking to better understand their customers, grow portfolios and control risk should look to alternative data as a source of opportunity, but also be careful to consider the distinct advantages and disadvantages inherent to each data source. As the figure below illustrates, the availability and predictive power of alternative data sources vary widely, and this may suit different needs for different lenders in different markets. Furthermore, financial institutions should recognize that credit scoring, based on alternative data or otherwise, is only one component of the lending process and therefore that a good credit score cannot guarantee strong portfolio performance. Finally, lenders should consider that in some cases these sources of data may be used as complements rather than substitutes, layered to provide a more nuanced understanding of credit risk and potential.
I hope these experiences are useful in your efforts- harnessing these and other new data sources can make a massive difference in the challenge of achieving financial inclusion.
Technical Notes: ginis all constructed with an automated forward stepwise logistic regression on randomly selected 80% build sample applied to 20% hold-out test sample, target is 90 days or more arrears in all cases. Data on mobile from a developing country in the caribbean. Data on social from eight countries in Africa. Data on psychometrics an average across six emerging markets in Latin America, South Asia, and East Asia. Data on traditional bureau predictive power from the national bureau of a mid-sized Latin American nation, hit rate for bureaus taken as average across Latin America, Africa, South Asia, and East Asia and the Pacific from the World Bank.
[ii] In the interest of full disclosure, EFL primarily uses psychometrics for credit decision-making, as this was the first technology we developed and deployed, and have found it a very effective contribution to enhanced emerging market lending.
[iii] A simple automatic algorithm involving a forward stepwise logistic regression on a randomly selected 80% build sample from the data, which is then tested on the remaining 20% out of sample hold-out.