MADCOW: Multi-Attribute Data Collected on Web
Project funded by the US National Science Foundation Political Science Program and Methods, Measurement and Statistics Program
- Philip Schrodt
- Jay Ulfelder
- Michael Ward
Systematic indicators of political characteristics such as regime type, internal instability and interstate conflict have been widely used in political and economic research over the past three decades, and are increasingly used in policy applications. With the exception of political event data, there have been only limited efforts to automate the production of these indicators. Instead these are generally produced by individual research projects and are only available after delays often measured in years. This reliance on human coding methods persists despite the fact that most of the information required to code these indicators is now available on the Web, and a wide variety of proven methods for processing that information is available as open-source software.
This project seeks to automate the production of several widely-used indicators of comparative and international behavior, and to do this in near-real-time. In addition, we will set up a general platform for the construction of new indicators and data subsetting, and an open collaborative system for the “crowd-sourcing” of coding which cannot be done reliably by automated methods alone. Our objective is to create a system which will be able to maintain these indicators at a very low cost, and to provide a platform for the development of additional data sets through decentralized collaboration.
We will initially focus on three general categories of indicators:
- Real-time event data: This will be coded from news aggregators such as Google News and European Media Monitor using the open-source TABARI system, supplemented with prepro- cessing using open source natural-language processing software, geo-location techniques, and a named-entity-resolution system for the identification of new political actors.
- Regime and governance indicators: We will build a system to code probabilistic measures of comparative government and regime type similar to those commonly derived from the widely-used Polity IV data set and those used in the Democracy and Dictatorship Revisited and Varieties of Democracy (V-Dem) projects. These codings will be based on annual reports from the U.S. Department of State and international human-rights monitoring organizations and will be updated in near-real time on the basis of relevant event data. We will also attempt to code and automatically update indicators on other aspects of governance, including civil- military relations and state collapse.
- Militarized disputes: We will use event data, textual frames, and crowd-sourced alerts to provide rapid updating on militarized disputes as they occur. We will also try to automatically provide data comparable to that coded in some of the other databases in the Correlates of War set, notably the list of IGO memberships, formal alliances and territorial change.