Big Data refers to data sets of much larger size, higher frequency, and often more personalized information. Examples include data collected by smart sensors in homes or aggregation of tweets on Twitter.
In small data sets, traditional econometric methods tend to outperform more complex techniques. In large data sets, however, machine learning methods shine. New analytic approaches are needed to make the most of Big Data in economics. Researchers and policymakers should thus pay close attention to recent developments in machine learning techniques if they want to take full advantage of new sources of Big Data.
While data was traditionally only collected for a specific purpose, often by a national statistical agency, the world is becoming increasingly quantified, where even the smallest company collects and records detailed and sometimes individualized data.
Data gets collected through a vast ecosystem of software (apps) and hardware (sensors) embedded in the vast sea of “smart” technology, including phones, Wi-Fi-connected appliances, cars, and satellites. The data avalanche has increased data flow in terms of both variety and velocity.
New opportunities abound for creating novel data sets from previously unstructured information, such as text and satellite images. This development has opened new areas of the economic query.
In an interconnected world, data is everywhere
How does this data help in economics?
These new data are affecting economic research along several dimensions. Many fields have shifted from a reliance on relatively small-sample government surveys to administrative data with universal or near-universal population coverage.
This shift is transformative, as it allows researchers to rigorously examine variation in wages, health, productivity, education, and other measures across different subpopulations; construct consistent long-run statistical indices; generate new quasi-experimental research designs, and track diverse outcomes from natural and controlled experiments.
Perhaps even more notable is the expansion of private-sector data on economic activity. These data, sometimes available from public sources but other times obtained through data-sharing agreements with private firms, can help to create the more granular and real-time measurement of aggregate economic statistics.
The data also offer researchers a look inside the “black box” of firms and markets by providing meaningful statistics on economic behavior such as search and information gathering, communication, decision-making, and micro-level transactions.
Collaborations with data-oriented firms also create new opportunities to conduct and evaluate randomized experiments.
The Role of Economic Theory:
Economic theory plays an important role in the analysis of large data sets with complex structures. It can be difficult to organize and study this type of data (or even to decide which variables to construct) without a simplifying conceptual framework, which is where economic models become useful. Better data also allow for sharper tests of existing models and tests of theories that had previously been difficult to assess.
The effective birth of economics as a separate discipline may be traced to the year 1776 when the Scottish philosopher Adam Smith published An Inquiry into the Nature and Causes of the Wealth of Nations.
There was, of course, economics before Smith: the Greeks made significant contributions, as did the medieval scholastics, and from the 15th to the 18th century, an enormous amount of pamphlet literature discussed and developed the implications of economic nationalism (a body of thought now known as mercantilism).
We have come a long way since then. What took many months or years a short while ago are answered in real-time. Economists have thus moved from forecasting to nowcasting. For instance, it is now possible to use real-time Google searches to predict changes in unemployment or Yelp data to predict local business patterns.
Why is economics important?
Economic science has evolved over several decades toward greater emphasis on empirical work. The data revolution of the past decade is likely to have a further and profound effect on economic research. Increasingly, economists make use of newly available large-scale administrative data or private sector data that often are obtained through collaborations with private firms, giving rise to new opportunities and challenges.
A recent article by CNN reporter Lydia DePillis says that Amazon has hired more than 150 Ph.D. economists in the past few years. The online retailer is perhaps the biggest employer of economists in the US after the Federal Reserve. The key point is that these economists are at the center of the action rather than detached advisers to senior management.
“Amazon’s economists game out real estate decisions, set the lowest prices that will deliver a profit, precisely determine what customers care about and whether advertisements are working—all using machine learning algorithms that automate decision making on a massive scale,"
- Lydia DePillis.
The growing role of economists in technology companies can be explained by the ubiquity of data as well as the power of computing. That is not all. Economists now have another role besides trawling through an ocean of customer data to find patterns. Economists have played a central role in the design of many online markets.
Just consider the way we use Uber compared to the way we use Airbnb. Uber has a centralized matching system. Its algorithm matches customers with cab drivers because we care more for a ride than the specific details of the taxi. The choice of architecture is necessarily different when it comes to booking apartments for a vacation. The specific details of the house matters. So, Airbnb has a decentralized system of choice in which the customer rather than the algorithm picks the house.
There is something common as well. Both Uber and Airbnb have built rating systems to build trust. Is the taxi driver known for rash driving? Is the stranger seeking to rent my place for a week notorious for leaving behind a trail of damages?
Ratings help beat such information asymmetries— and create the trust that is essential in functioning markets. A lot of economic design has gone into these online markets.
Some of this has now begun to spill over into the realm of public policy. For example, The Billion Prices Project at the Massachusetts Institute of Technology (MIT) builds daily price indexes based on online prices. Its home page right now gives pride of place to a project to calculate the real inflation rate in Venezuela—where an economic collapse makes fudging of official statistics very likely. Economists working on the project spotted very early that Argentina was under-reporting inflation in its official statistics, till a new price index was introduced in May 2016.
How has India reacted to this shift?
Indian public policy is also getting into the act. The Reserve Bank of India is setting up a data sciences lab. Economists at the finance ministry have already used big data analytics to chart our patterns of internal migration (using data from the railway's computerized booking system) and interstate trade (using preliminary data from the Goods and Services Tax Network).
The Economic Survey released in August 2017 cited work by the IDFC Institute based on satellite images to show how the density of the built-up area in the Kozhikode Metropolitan Area spread between 1975 and 2014.
In a speech given at RBI in August 2018, Prof. Roberto Rigobon MIT Sloan School of Management distinguished designed data and organic data. The former comes from surveys and administrative sources. The latter “is generated by individuals without them noticing they are being surveyed. It is the data in the GPS of your phone, your searches on the web, the friends in your network, the things you purchase".
Organic data will begin to challenge the monopoly of designed data, though it cannot replace it. Rigobon said that the main advantage of organic data is that it is truthful. It is based on actual behavior rather than recall. Also, it is categorized based on behavior rather than the traditional ordering by geography or socioeconomic conditions. The downsides are that organic data is often not representative and it can involve privacy violations.
Cut to October 2015. The Indian central bank was fighting a battle against high inflation. The committee of economists advising the RBI governor on monetary policy said in a statement released that month: “Moreover, there has been a comfort on the inflation front—wholesale prices are contracting, GDP (gross domestic product) consumption deflator has been low at around 3%, and with vendors engaged in e-commerce offering low prices, retail inflation may be lower than what the headline number suggests"
Let us now compare and contrast the pros and cons of Big Data Economics.
What are the pros?
Complex data are now available, characterized by the large volume, fast velocity, diverse varieties, and the ability to link many data sets together.
Powerful new analytic techniques derived from machine learning are increasingly part of the mainstream econometric toolbox.
Big Data allows for better prediction of economic phenomena and improves causal inference.
Machine learning techniques facilitate the creation of simple models that describe large and complex data sets.
Machine learning methods and Big Data also allow for the complex modeling of relationships that predict well beyond the sample.
What are the cons?
Predictions based on Big Data may have privacy concerns.
Machine learning methods are computationally intensive, may not have unique solutions, and may require a high degree of fine-tuning for optimal performance.
Big Data is costly to collect and store, and analyzing it requires investments in technology and human skill.
Big Data may suffer from selection bias depending on how and by whom data are being generated.
Access to these data may involve partnering with firms that limit researcher freedom.
The growing use of digital transactions—by consumers, investors, taxpayers—as well as the rise of newer forms of data collection has the potential to revolutionize Indian public policy. It is unlikely that these newer forms of data will completely replace the more traditional numbers derived from surveys, national accounts, and administrative data.
They will more likely complement each other. Government agencies will increase their dependence on big data analytics in the coming years—though the risks to individual privacy should not be underestimated.