Liberty Street Economics

« | Main | »

January 4, 2012

Forecasting with Internet Search Data

Rebecca Hellerstein and Menno Middeldorp

Most economic data are released with a lag, sometimes quite a substantial one. Since the advent of regularly scheduled releases of economic data in the 1930s, a key challenge for economists has been to identify indicators that provide timely information about the release before it comes out—effectively, that “now-cast” its content. Recent academic research suggests that counts of Internet searches for certain words or phrases can predict some macroeconomic data releases. In this post, we show that Internet search counts can also predict some financial market data releases, as well as future price movements in some financial markets.

    Economic data are generally released days to weeks after the activity they measure, which can stymie economists’ efforts to assess the state of the economy in real time. This data lag is why the formulation of economic policy is often likened to driving a car by looking out the rear window. Thus, being able to forecast the content of economic releases could improve policymaking.

    Recent academic research finds that Internet search data can forecast the content of some economic data releases. The underlying intuition is compelling. Almost 80 percent of Americans use the Internet (CIA World Factbook) and 16.4 billion Internet searches were conducted in the United States in December 2010 (comScore). If we assume some connection between what people search for and their subsequent behavior, these data have the potential to track or lead all kinds of phenomena. An obvious example: more people searching for information about automobiles may precede an increase in car sales. To give just a couple of examples of recent academic work, Wu and Brynjolfsson (2009) find that Internet search counts predict U.S. housing sales and prices, while Askitas and Zimmermann (2009) identify a relationship to German unemployment. (In the appendix, we discuss this literature a bit more.)

    To date, little research has been done on using Internet search data to forecast financial market developments (apart from some stock market measures). The reason is that the search data must have leading properties because most financial market data are released with a very short lag. In this post, we show that Internet search data are a useful addition to the economist’s toolkit for forecasting financial market data. We show, first, that counts of search queries effectively now-cast the official release of mortgage refinancing applications. We then show that search counts predict movements in non-deliverable forwards on Chinese currency.

    A number of search engines are available on the Internet: Our data come from Google Insight for Search. Our understanding is that academic research is ongoing using data from other search engines as well, such as Microsoft’s Bing. We analyze search counts for certain words or phrases relative to total search counts in a geographical region (for example, worldwide, country, state, or metro region). So if the counts for one of our search phrases are not rising as fast as are total search counts, for example, then this relative measure will decline.

Anticipating the MBA Mortgage Refinancing Index

The Mortgage Bankers Association (MBA) releases several indexes of mortgage applications. One of these looks specifically at refinancing. A weekly index, it is released on the Wednesday following the week in which the data are gathered. It is plausible that when applying for refinancing, borrowers seek relevant information on the Internet. Because the data are released with a lag, any contemporaneous data that influences applications would be useful in forecasting the data release as well as tracking applications in real time. We consider a simple model using (1) the search term “mortgage refinance” for the geographic area the United States and in the subcategory “home financing,” (2) a lag of the applications index, and (3) the ten-year Treasury yield and its one-period lag. The model now-casts refinancing applications relatively well (the R2—a measure of the share of variation in refinancing applications accounted for by the model—is 34  percent, which is reasonable for weekly changes).

    Additional explanatory variables in the model account for other publicly available information that may also now-cast refinancing activity. Simply identifying a relationship between search data and some economically or financially relevant variable does not show it is useful if other publically available data can do a better job. In this case, the search index increases the R2 by about 10 percentage points and is highly statistically significant, which suggests that the search data have information not captured by the model’s other variables.

    The search data do not lead actual applications (a one-week lag is not statistically significant). However, because the official data are released after a week, the search data still provide a useful now-cast.

Model of Mortgage Refinance Applications
Dependent variable: MBA refinancing index  
    Search term “mortgage refinance”  0.0077***
     MBA refinancing index, t-1 -0.2558***
    Treasury yield, ten-year -0.7123***
    Treasury yield, ten-year, t-1 -0.3753*
     Constant  0.0135**
     R² 34 percent

Sources: Mortgage Bankers Association; Google Insight for Search; Bloomberg L.P.; authors’ calculations.

Notes: All variables are weekly changes.
Newey-West HAC standard errors and covariance

*** p<1 percent, ** p<5 percent, * p<10 percent


Predicting Chinese Renminbi Non-deliverable Forwards

Unlike now-casting, using Internet search data to predict financial market movements is a more fraught exercise. We could not forecast gold prices, European sovereign spreads, interbank rates, and equity market implied volatility with models using search data. However, in markets that, for some reason, do not efficiently aggregate information, search data do provide useful information.

    For example, language barriers and institutional impediments may make it difficult for offshore financial markets to access or interpret information from mainland China. As a result, relevant information about a potential renminbi revaluation may reach the offshore market with a delay. Interestingly, search counts for the term “人民币,” which means the people’s (人民, rénmín) currency (币, bì), or renminbi, tend to lead expectations of a revaluation. The search counts lead changes in the non-deliverable forward on the dollar-renminbi exchange rate by a week, even after controlling for momentum in the forward rate, and are highly statistically significant. Although the R2 is low at 5 percent, it is interesting to find evidence of search counts leading market moves.


Economists are always looking for ways to improve their forecasts—to make their crystal ball a bit less cloudy. We find that Internet search counts possess useful information, not available in other variables, to now-cast or forecast the trajectory of some financial market data. While this predictive power is by no means universal—as we observe above, for a number of markets, Internet search data do not provide explanatory power beyond that of more traditional forecasting methods—the basic message is of a useful addition to the economist’s toolkit.

Appendix: Related Reading on Uses of Internet Search Data

Several research studies use Internet search data. Hyunyoung Choi and Hal Varian, both of Google, have established the usefulness of search data to predict upcoming economic data releases for U.S. retail sales, auto sales, home sales, and initial jobless claims, as well as visitor statistics for Hong Kong (2009). Chamberlin (2010) of the U.K. Office for National Statistics examines search data’s correlation with British retail sales, property transactions, car registrations, and foreign trips.

    A couple of papers have looked at the housing market. Wu and Brynjolfsson (2009) find that search data foreshadow U.S. housing sale and prices. Webb (2009) finds a strong correlation between the keyword “foreclosure” and actual foreclosures in the United States. McLaren and Shanbhogue (2011) of the Bank of England look at several markets, but find the strongest contribution of search data in a model forecasting U.K. house prices.

    Regarding unemployment, Askitas and Zimmermann (2009) show strong correlations between search data and German unemployment. D’Amuri (2009) of the Bank of Italy finds that an Internet-search-based measure is superior to other leading indicators in predicting Italian unemployment. D’Amuri and Marcucci (2009) find that augmenting models of the U.S. unemployment rate with an Internet job-search indicator outperforms traditional forecasting methods and the Survey of Professional Forecasters. Suhoy (2009) of the Bank of Israel finds search data to be a good predictor of labor market conditions in that country.

    Several papers examine the usefulness of search data in the area of U.S. consumer confidence and spending. Della Penna and Huang (2009) develop a query-based consumer confidence measure that leads those of the University of Michigan and the Conference Board. Schmidt and Vosen (2010) find that search data outperform these two consumer confidence indexes in forecasting private consumption. Similarly, Kholodilin, Podstawski, and Siliverstovs (2010) show that an Internet-search-based forecasting model outperforms several benchmark models of private consumption.

    While most of the academic work to date has focused on economic data, search data have also been used in stock market analysis, although only one group of researchers find that the data can predict prices. Andrade, Bian, and Burch (2010) use search data to identify peak interest in stock investing in a study of the sharp run-up in Chinese stock prices in 2007. Preis, Reith, and Stanley (2010) find that searches for specific company names correlate with transaction volumes for those companies’ shares. Vlastakis and Markellos (2010) use search data as an indicator for information demand on specific stocks and find that this leads not only trading volume but also volatility. Da, Engelberg, and Gao (2010a) construct an “investor attention” index using search data and find that it predicts higher stock prices over a two-week horizon, followed by a reversal over a one-year time frame. They also conclude (2010b) that searches for a firm’s most popular products are better than analyst forecasts at predicting earnings surprises and the subsequent market reaction.

The views expressed in this post are those of the author(s) and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. Any errors or omissions are the responsibility of the author(s).


Feed You can follow this conversation by subscribing to the comment feed for this post.

Many thanks to those who took the time to give comments. It is interesting to read that more precise search engine data can provide more “resolution” in order to forecast financial markets. Here’s a link to the Mao, Counts and Bollen (2011) paper mentioned, which finds predictive value in search data for stock markets: Regarding the request for the underlying data presented in the analysis, please see the following links: Google Insights for Search: US Interest rate data and a wealth of other economic and financial data can be downloaded from the Federal Reserve Bank of St. Louis: The MBA charges a subscription fee for access to its Weekly Application Survey:

The close correlation between Google search data and actual mortgage applications is fascinating. Great article and thank you for the related reading list. In our research we also found that aggregated Google Trends data does not have sufficient “resolution” to predict equity, commodity or forex markets. However, real-time “raw” search engine data that includes geo information, precise timestamps and several other data elements can contain predictive signals for those markets. In December of 2011 Mao / Counts / Bollen published a recommended paper on this topic.

The comments to this entry are closed.

About the Blog

Liberty Street Economics features insight and analysis from New York Fed economists working at the intersection of research and policy. Launched in 2011, the blog takes its name from the Bank’s headquarters at 33 Liberty Street in Manhattan’s Financial District.

The editors are Michael Fleming, Andrew Haughwout, Thomas Klitgaard, and Asani Sarkar, all economists in the Bank’s Research Group.

Liberty Street Economics does not publish new posts during the blackout periods surrounding Federal Open Market Committee meetings.

The views expressed are those of the authors, and do not necessarily reflect the position of the New York Fed or the Federal Reserve System.

Economic Research Tracker

Image of NYFED Economic Research Tracker Icon Liberty Street Economics is available on the iPhone® and iPad® and can be customized by economic research topic or economist.

Economic Inequality

image of inequality icons for the Economic Inequality: A Research Series

This ongoing Liberty Street Economics series analyzes disparities in economic and policy outcomes by race, gender, age, region, income, and other factors.

Most Read this Year

Comment Guidelines


We encourage your comments and queries on our posts and will publish them (below the post) subject to the following guidelines:

Please be brief: Comments are limited to 1,500 characters.

Please be aware: Comments submitted shortly before or during the FOMC blackout may not be published until after the blackout.

Please be relevant: Comments are moderated and will not appear until they have been reviewed to ensure that they are substantive and clearly related to the topic of the post.

Please be respectful: We reserve the right not to post any comment, and will not post comments that are abusive, harassing, obscene, or commercial in nature. No notice will be given regarding whether a submission will or will
not be posted.‎

Comments with links: Please do not include any links in your comment, even if you feel the links will contribute to the discussion. Comments with links will not be posted.

Send Us Feedback

Disclosure Policy

The LSE editors ask authors submitting a post to the blog to confirm that they have no conflicts of interest as defined by the American Economic Association in its Disclosure Policy. If an author has sources of financial support or other interests that could be perceived as influencing the research presented in the post, we disclose that fact in a statement prepared by the author and appended to the author information at the end of the post. If the author has no such interests to disclose, no statement is provided. Note, however, that we do indicate in all cases if a data vendor or other party has a right to review a post.