220x Filetype PDF File size 1.33 MB Source: aclanthology.org
Uncertainty over Uncertainty: Investigating the Assumptions,
Annotations, and Text Measurements of Economic Policy Uncertainty
Katherine A. Keith∗ Christoph Teichmann
University of Massachusetts Amherst Bloomberg
kkeith@@cs.umass.edu cteichmann1@bloomberg.net
BrendanO’Connor EdgarMeij
University of Massachusetts Amherst Bloomberg
brenocon@@cs.umass.edu emeij@bloomberg.net
Abstract pers (Thorsrud, 2020) have recently been used as
new, alternative data sources.
Methods and applications are inextricably In one such economic text-as-data application,
linked in science, and in particular in the do- Baker et al. (2016) aim to construct an economic
main of text-as-data. In this paper, we exam- policy uncertainty (EPU) index whereby they quan-
ine one such text-as-data application, an estab- tify the aggregate level that policy is influencing
lishedeconomicindexthatmeasureseconomic
policy uncertainty from keyword occurrences economic uncertainty (see Table 1 for examples).
in news. This index, which is shown to cor- Theyoperationalize this as the proportion of news-
relate with firm investment, employment, and paper articles that match keywords related to the
excess market returns, has had substantive im- economy, policy, and uncertainty.
pact in both the private sector and academia. Theindexhashadimpactbothontheprivatesec-
Yet, as we revisit and extend the original au- 1
thors’ annotations and text measurements we tor and academia. In the private sector, financial
findinteresting text-as-data methodological re- companies such as Bloomberg, Haver, FRED, and
search questions: (1) Are annotator disagree- Reuters carry the index and sell financial profes-
ments a reflection of ambiguity in language? sionals access to it. Academics show economic pol-
(2) Do alternative text measurements correlate icy uncertainty has strong relationships with other
with one another and with measures of exter- economic indicators: Gulen and Ion (2016) find a
nal predictive validity? We find for this ap- negative relationship between the index and firm-
plication (1) some annotator disagreements of level capital investment, and Brogaard and Detzel
economic policy uncertainty can be attributed
to ambiguity in language, and (2) switching (2015) find that the index can positively forecast
measurements from keyword-matching to su- excess market returns.
pervised machine learning classifiers results in The EPU index of Baker et al. has substantive
low correlation, a concerning implication for impact and is a real-world demonstration of finding
the validity of the index. economic signal in textual data. Yet, as the sub-
1 Introduction field of text-as-data grows, so too does the need for
rigorous methodological analysis of how well the
The relatively novel research domain of text-as- chosen natural language processing methods opera-
data, which uses computational methods to au- tionalize the social science construct at hand. Thus,
tomatically analyze large collections of text, is a in this paper we seek to re-examine Baker et al.’s
rapidly growing subfield of computational social linguistic, annotation, and measurement assump-
sciencewithapplicationsinpoliticalscience(Grim- tions. Regarding measurement, although keyword
mer and Stewart, 2013), sociology (Evans and look-ups yield high-precision results and are inter-
Aceves, 2016), and economics (Gentzkow et al., pretable, they can also be brittle and may suffer
2019). In economics, textual data such as news from low recall. Baker et al. did not explore alter-
editorials (Tetlock, 2007), central bank communi- native text measurements based on, for example,
cations (Lucca and Trebbi, 2009), financial earn- wordembeddingsorsupervised machine learning
ings calls (Keith and Stent, 2019), company dis- classifiers.
closures (Hoberg and Phillips, 2016), and newspa-
1AsofOctober7,2020,GoogleScholarreportsBakeretal.
∗This work was done during an internship at Bloomberg. (2016) to have over 4400 citations.
116
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 116–131
c
Online, November 20, 2020.
2020 Association for Computational Linguistics
https://doi.org/10.18653/v1/P17
No. Example
1 Demandfornewclothingisuncertain because several states may implement large hikes in their sales tax rates.
2 Theoutlook for the H1B visa program remains highly uncertain. As a result, some high-tech firms fear that shortages
of qualified workers will cramp their expansion plans.
3 Theloomingpolitical fight over whether to extend the Bush-era tax cuts makes it extremely difficult to forecast federal
income tax collections in 2011.
4 Uncertainty about prospects for war in Iraq has encouraged a build-up of petroleum inventories and pushed oil prices
higher.
5 Someeconomistsclaim that uncertainties due to government industrial policy in the 1930s prolonged and deepened
the Great Depression.
6 It remains unclear whether the government will implement new incentives for small business hiring.
Table 1: Positive examples of policy-related economic uncertainty. We label spans of text as indicating policy,
economy, uncertainty, or a causal relationship. Examples were selected from hand-labeled positive examples and
the coding guide provided by Baker et al. (2016).
In exploring Baker et al.’s construction of EPU, preliminary evidence that disagreements in anno-
weidentify and disentangle multiple sources of un- tation could be attributed to inherent ambiguity
certainty. First, there is the real underlying uncer- in the language that expresses EPU (§3).
tainty about economicoutcomesduetogovernment • Finally, we replicate and extend Baker et al.’s
policy that the index attempts to measure. Second, data pipeline with numerous measurement sen-
there is semantic uncertainty that can be expressed sitivity extensions: filtering to US-only news,
in the language of newspaper articles. Third, there keyword-matching versus supervised document
is annotator uncertainty about whether a document classifiers, and prevalence estimation approaches.
should be labeled as EPU or not. Finally, there Wedemonstratethatameasureofexternalpredic-
is modeling uncertainty in which text classifiers tive validity, i.e., correlations with a stock-market
are uncertain about the decision boundary between volatility index (VIX), is particularly sensitive to
positive and negative classes. these decisions (§4).
In this paper, we revisit and extend Baker et al.’s
humanannotation process (§3) and computational 2 AssumptionsofMeasuringEconomic
pipeline that obtains EPU measurement from text
(§4). In doing so, we draw on concepts from quan- Policy Uncertainty from News
titative social science’s measurement modeling, The goal of Baker et al. (2016) is to measure the
mappingobservable data to theoretical constructs, theoretical construct of policy-related economic
which emphasizes the importance of validity (is it uncertainty (EPU) for particular times and geo-
right?) and reliability (can it be repeated?) (Lo- graphic regions. Baker et al. assume they can use
evinger, 1957; Messick, 1987; Quinn et al., 2010; information from newspaper articles as a proxy for
Jacobs and Wallach, 2019). EPU,anassumption we explore in great detail in
Overall, this paper contributes the following: Section 2.2, and they define EPU very broadly in
• Weexaminetheassumptions Baker et al. use to their coding guidelines: “Is the article about policy-
operationalize economic policy uncertainty via related aspects of economic uncertainty, even if
2
keyword-matching of newspaper articles. We onlytoalimitedextent?” Foranarticletobeanno-
demonstrate that using keywords collapses some tated as positive, there must be a stated causal link
rich linguistic phenomena such as semantic un- between policy and economic consequences and
3
certainty (§2.1). either the former or the latter must be uncertain.
• We also examine the causal assumptions of Grounds for labeling a document as a positive in-
Baker et al. through the lens of structural causal clude “uncertainty regarding the economic effects
models (Pearl, 2009) and argue that readers’ per- of policy actions” (or inactions), and “uncertainty
ceptions of economic policy uncertainty may be 2http://policyuncertainty.com/media/
important to capture (§2.2). Coding_Guide.pdf
3“If the article discusses economic uncertainty in one part
• We conduct an annotation experiment by re- and policy in another part but never discusses policy in con-
nection to economic uncertainty, then do not code it as about
annotating documents from Baker et al.. We find economic policy uncertainty.”
117
KeyOrg KeyExp
Economy economic, economy +growth, economies, financial, recession,
slowdown
Uncertainty uncertain, uncertainty +unclear, unsure, uncertainties, turmoil, confusion,
worries
Policy regulation, deficit, legislation, congress, white house, federal reserve, the fed, regulations, regulatory,
deficits, congressional, legislative, legislature
Table 2: Original keywords used in Baker et al.’s monthly United States index (KeyOrg). Expanded keywords
includeallwordsfromKeyOrgplusthefivenearestneighborsfrompre-trainedGloVeembeddingsfortheeconomy
and uncertainty categories (KeyExp).
over who makes or will make policy decisions that from the statement, making it vague, ambiguous,
have economic consequences.” In Table 1, we pro- or misleading” and in the context of Baker et al.
vide examples of text spans that successfully en- could result from journalists’ linguistic choices to
code EPU given these guidelines. For instance, express ambiguity in economic policy uncertainty.
the first example indicates that a government pol- For instance, in the first example in Table 3, the
icy (increase in state sales tax) is causing uncer- lexical cues “suggest” and “might” indicate to the
tainty in the economy (demand for new clothing). reader that the journalist writing the article is un-
Baker et al. operationalize this theoretical con- clear about the intention of Alan Greenspan. In
struct of EPU as keyword-matching of newspaper contrast, epistemic modality “encodes how much
documents: for each document, if the document certainty or evidence a speaker has for the proposi-
has at least one word in each of the economy, un- tion expressed by his utterance,” (e.g., “Congress-
certainty, and policy keyword categories (see Ta- womanX:‘Wemaydelaypassingthetariffbill.’”)
ble 2 in the Appendix) then it is considered a posi- and doxastic modality refers to the beliefs of the
tive document. Counts of positive documents are speaker (“I believe that Congress will ...”). In the
summedandthennormalizedbythetotalnumber second example in Table 3, the entity “he” seems
of documents published by each news outlet. to be uncertain about the fate of the economy be-
2.1 Semantic Uncertainty cause he “shakes his head in bewilderment,” which
demonstrates that uncertainty can also be conveyed
While the keywords Baker et al. (2016) select (“un- through world knowledge and inference.
certain” or “uncertainty”) are the most overt ways Collapsing all these types of semantic uncer-
to express uncertainty via language, they do not tainty to the keywords “uncertainty” and “uncer-
capture the full extent of how humans express tain” has major implications: (a) the relationship
uncertainty. For instance, Example No. 6 in Ta- between the uncertainty journalists express and
ble 1 would be counted as a negative by Baker what readers infer impacts the causal assumptions
et al. despite indicating semantic uncertainty via (§2.2) and annotation decisions (§3) of this task,
the phrase “it remains unclear.” These keyword and(b) Baker et al.’s keywords are most likely low-
assumptions are a threat to content validity, “the recall which could affect empirical measurement
extent to which a measurement model captures ev- results (§4). We see fruitful future work in improv-
erything we might want it to” (Jacobs and Wallach, ing content validity and recall via automatic uncer-
2019). tainty and modality analysis from natural language
We look to definitions from linguistics to po- processing, e.g. McShane et al. (2004); Ganter
tentially expand the operationalization of uncer- ´
and Strube (2009); Saurı and Pustejovsky (2009);
tainty; we refer the reader to Szarvas et al. (2012) Farkas et al. (2010); Szarvas et al. (2012).
for all subsequent definitions and quotes. In par- 2.2 Causal Assumptions
ticular, uncertainty is defined as a phenomenon Using the paradigm of structural causal models
that represents a lack of information. With re- (Pearl, 2009), we re-examine the causal assump-
spect to truth-conditional semantics, semantic un- tions of Baker et al.. In Figure 1, for a single time-
certainty refers to propositions “for which no truth 4 ∗
value can be attributed given the speaker’s men- step, U represents the real, aggregate level of
tal state.” Discourse-level uncertainty indicates 4Baker et al. (2016) aggregate by day, month, quarter, or
“the speaker intentionally omits some information year.
118
Example Docid
The stock market had soared on Mr. 1047100
Greenspan’s suggestion that global financial
problems posed as great a threat to the United
States as inflation did, suggesting that a rate
cut to stimulate the economy might be on the
horizon
ButaskhimwhethertheMexicanstockmarket 1043578
will rise or plunge tomorrow and he shakes
his head in bewilderment.
Table 3: Selected examples extracted from the New Figure 1: Structural causal model of the economic pol-
York Times Annotated Corpus (NYT-AC) that convey icy uncertainty measurements in which variables are
semantic uncertainty about the economy. Bolding is nodes and directed edges denote causal dependence.
our own. Docids are from the NYT-AC metadata. Unlike Baker et al. (2016) who claim to measure U,
weposit that measuring H is important. Shaded nodes
economic policy uncertainty in the world which is are observed variables and unshaded nodes are latent.
unobserved. If one could obtain a measurement of
U∗,thenonecouldanalyze the causal relationship to measure and model human perception of EPU,
between U∗ and other macroeconomic variables, an assumption we explore in terms of annotation
M. Presumably, newspaper reporting, X, is af- decisions in Section 3.
∗ ∗
fected by U and x = f (u ) where f is a non-
X X 3 Annotator Uncertainty
parametricfunctionthatrepresentsacausalprocess.
In our setting, f represents the process of media
X Reliable human annotation is essential for both
production: for example, the ability of journalists building supervised classifiers and assessing the
to collect information from sources; or editorial internal validity of text-as-data methods. In order
decisions on what topics will be published. The to validate their EPU index, Baker et al. sample
major assumption of Baker et al. is that they can
obtain a measure of U∗ via a proxy measure from documents from each month, obtain binary labels
newspaper text, U, where u = f (x). By simple on the documents from annotators, and then con-
∗ U struct a “human-generated”indexwhichtheyreport
composition, u = f (f (u )). Yet, aside from
U X has a 0.86 correlation with their keyword-based in-
examining the political bias of media, Baker et al. dex (aggregated quarterly). Yet, in our analysis of
largely ignore f and how the media production
X Baker et al.’s annotations (denoted below as BBD),
process could influence EPU measurements.
However, an alternative causal path from U∗ to we find only 16% of documents have more than
MgoesthroughH∗,themacro-level human per- one annotator and of these, the agreement rates are
ception of real EPU. In this case, U∗ is irrelevant moderate: 0.80 pairwise agreement and 0.60 Krip-
as long as people are perceiving policy-related eco- pendorff’s α chance-adjusted agreement (Artstein
nomic uncertainty to be changing, they could po- and Poesio, 2008). See Line 2 of Table 4 for ad-
tentially make real economic decisions (e.g. hiring ditional descriptive statistics of these annotations.
or purchases) that could affect the greater macro- Theoriginal authors did not address whether this
economy, M. disagreement is a result of annotator bias, error in
It is unclear how to design a causal intervention annotations, or true ambiguity in the text.
in which one manipulates the real EPU, do(U∗), in In contrast to the popular paradigm that one
order to estimate its effect on X and M. However, should aim for high inner-annotator agreement
one could design an ideal causal experiment to rates (Krippendorff, 2018), recent research has
intervene on newspaper text, do(X); one could shown“disagreement between annotators provides
artificially change the level of EPU coverage in a useful signal for phenomena such as ambiguity
synthetic articles, show these to participants, and in the text” (Dumitrache et al., 2018). Addition-
measure the resulting difference in participants’ ally, recent research in natural language processing
economic decisions. If H∗ to M is the causal
5 manperception is important: In the EPU index released to the
path of interest, then it is extremely important public, one of three underlying components is a disagreement
5 of economic forecasters as a proxy for uncertainty. See http:
There is some evidence from the original authors that hu- //policyuncertainty.com/methodology.html.
119
no reviews yet
Please Login to review.