
First Policy Analytics Symposium
The following series of working papers showcases the development of policy analytics through novel datasets and methodologies. Future endeavors from the Policy Analytics Society will seek to build on the type of innovative, data-driven policy research presented in this symposium. The papers are divided categorically below alongside their abstracts.
Concept and Methodology:
Towards a Formalization of Policy Analytics – Dustin Chambers
A relatively new field within economics, policy analytics has experienced rapid growth and yielded insights in many disciplines within the profession. Nonetheless, a theoretical framework from which to conceptualize this nascent field has yet to emerge in the literature. This paper introduces a few concepts that should be useful in establishing a formal policy analytics framework.
How to Improve Data Validation in Five Steps – Danilo Friere
Social scientists are awash with new data sources. Though data preprocessing and storage methods have developed considerably over the past few years, there is little agreement on what constitutes the best set of data validation practices in the social sciences. In this paper I provide five simple steps that can help students and practitioners improve their data validation processes. I discuss how to create testable validation functions, how to increase construct validity, and how to incorporate qualitative knowledge in statistical measurements. I present the concepts according to their level of abstraction, and I provide practical examples on how scholars can add my suggestions to their work.
Democratizing Policy Analytics with AutoML – Danilo Friere
Machine learning methods have made significant inroads in the social sciences. Computer algorithms now help scholars design cost-effective public policies, predict rare social events, and improve the allocation of funds. However, building and evaluating machine learning algorithms remain labor-intensive, error-prone tasks. Thus, areas that could benefit from modern computer algorithms are often held back owing to implementation challenges or lack of technical expertise. In this paper, I show how scholars can use automated machine learning (AutoML) tools to preprocess their data and create powerful estimation methods with minimal human input. I demonstrate the functionalities of three open-source, easy-to-use AutoML algorithms, and I replicate a well-designed forecasting model to highlight how researchers can achieve similar results with only a few lines of code.
Specific Applications:
Using Machine Learning to Capture Heterogeneity in Trade Agreements – Scott Baier and Narendra Regmi
In this paper, we employ machine learning techniques to capture heterogeneity in free trade agreements. The tools of machine learning allow us to quantify several features of trade agreements, including volume, comprehensiveness, and legal enforceability. Combining machine learning results with gravity analysis of trade, we find that more comprehensive agreements result in larger estimates of the impact of trade agreements. In addition, we identify the policy provisions that have the most substantial effect in creating trade flows. In particular, legally binding provisions on antidumping, capital mobility, competition, customs harmonization, dispute settlement mechanism, e-commerce, environment, export and import restrictions, freedom of transit, investment, investor-state dispute settlement, labor, public procurement, sanitary and phytosanitary measures, services, technical barriers to trade, telecommunications, and transparency tend to have the largest trade creation effects.
Validating Readability and Complexity Metrics: A New Dataset of Before-and-After Laws – Wolfgang Alschner
If algorithms are to be the policy analysts of the future, the policy metrics they produce will require careful validation. This paper introduces a new dataset that assists in the creation and validation of automated policy metrics. It presents a corpus of laws that have been redrafted to improve readability without changing content. The dataset has a number of use cases. First, it provides a benchmark of how expert legislative drafters render texts more readable. It thereby helps test whether off-the-shelf readability metrics such as Flesch-Kincaid pick up readability improvements in legal texts. It can also spur the development of new readability metrics tailored to the legal domain. Second, the dataset helps train policy metrics that can distinguish policy form from policy substance. A policy text can be complex because it is poorly drafted or because it deals with a complicated substance. Separating form and substance creates more reliable algorithmic descriptors of both.
Measuring a Contract's Breadth: A Text Analysis – Joshua C. Hall, Bryan McCannon, and Yang Zhou
We use a computational linguistic algorithm to measure the topics covered in the text of school teacher contracts in Ohio. We use the topic modeling metrics in a calculation of the concentration of topics covered. This allows us to assess how expansive each contract is. As a proof of concept, we evaluate the relationship between our topic diversity measurement and the prevalence of support staff. This test is done on a subsample of the contracts in the state. If more specialized services are provided, then contracts must presumably be broader as they cover more employment relationships. We confirm a strong, statistically significant relationship between our measurement and the prevalence of these support staff. Thus, we have a valid measurement of contract breadth.
Man vs. Machine: A Novel Evaluation of Data Analytics Using Occupational Licensing as a Case Study – Edward J. Timmons and Conor Norris
For researchers of state regulatory policy, the difficulty of gathering data has long presented an obstacle. This study compares two new databases for state-level occupational licensing laws. The Knee Center for the Study of Occupational Regulation (CSOR) database uses traditional manual reading to gather data, while RegData uses a machine learning algorithm. We describe both data-gathering processes, weigh their costs and benefits, and compare their outputs. The CSOR database allows researchers to find specific licensing requirements typically used in the occupational licensing literature, but the traditional methodology is time and labor intensive. RegData provides researchers with a better overall measure of stringency and complexity in regulation that allows for comparisons across states. However, RegData cannot reach the level of detail in the CSOR database. The variables gathered by CSOR and RegData are useful for researchers and policymakers and can be used as a model to build databases for other state-level regulations.