Wiki Surveys for Social Science Research

Surveys and interviews form the central methodology for analyzing and discovering attitudes and opinions in social science research. With the advent of Web, online surveys have become an efficient way for researchers to collect and analyze large amounts of data. The popularity of the online survey tools like SurveyMonkey , Zoomerang, SurveyGizmo , etc. are testament to the productivity enabled by surveys. However, surveys represent a top-down rigid methodology forcing the survey designer to account for all possible answers up front, which is an impossible feat. In contrast, interviews allow the unanticipated information to bubble up bottoms up from the respondents. For instance, Integrity Watch Afghanistan (IWA), Afghan Perceptions and Experiences with Corruption: A National Survey 2010 primary data, involves interviewing randomly selected 6,500 respondents in 32 provinces on over 100 questions that deal with sectors where people experienced corruption; levels of bribes people paid to obtain services; what type of access people had to essential services; who people trusted to combat corruption; and experiences with corruption in the judiciary, police, and land management. However, the interview methodology is expensive and time-consuming as it requires implementation by research companies with expertise in effective research design, and precise management of data collection over several months.

Is there an alternative to surveys and interviews in social science research? Prof. Salganik's team at Princeton came up with a hybrid approach, "wiki surveys", that combines the structure of a survey with the open-endedness of an interview. To date, various organizations have created more than 1000 wiki surveys on the project Web site - All Our Ideas, generating in 45,000 ideas with 2 million votes. Wiki surveys range from the New York City Mayor's Office's engagement with citizens in shaping the city’s long term sustainability plan to the Catholic Relief Services surveying their 4000 employees to find out what makes an ideal relief worker. The figure below shows how the third question in Tactical Conflict Assessment Planning Framework (TCAPF) would be be implemented as a wiki survey:

tcapf wiki survey.jpg

Inspired by extending the kittenwar concept to ideas, the user interface guides the respondent to choose between two random alternatives, while encouraging the respondents to add their ideas into the mix of alternative responses. The additional ideas are added into the survey’s marketplace and voted up or down by the other survey-takers. Prof. Salganik says that “One of the patterns we see consistently is that ideas that are uploaded by users sometimes score better than the best ideas that started it off. Because no matter how hard you try, there are just ideas out there that you don’t know.”

All Our ideas have some basic visualization features to make sense of the wiki survey responses. Here is the visualization for the responses - "What do you think the Digital Public Library of America (DPLA) should be like?":

DPLA Survey Reponse.jpg

It is worth noting that the top scoring 15 ideas starting with DPLA interoperability with Government Printing Office (GPO), Defense Technical Information Center (DTIC), an National Records Archive Administration (NARA) are all uploaded ideas not in the original set of alternatives. A powerful argument for crowd sourcing!

Admittedly, we still need boots on the ground to collect TCAPF data in Afghanistan given the demographics of the people we want to reach. On the other hand, wiki surveys hold great potential in reaching the younger generation fueling the Arab spring and the like.

Increased political violence in store for Italy and Czech Republic?

In collaboration with our academic partners Prof. Cingranelli at the Political Science Department, SUNY Binghamton University and Profs. Sam Bell and Amanda Murdie at the Department of Political Science, Kansas State University, we developed a Domestic Political Violence Model that forecasts political violence levels five years into the future. The model enables policymakers, particularly in the COCOMs, to proactively plan for instances of increased domestic political violence, with implications for resource allocation and intelligence asset assignment. Our model uses the IDEA dataset for political event coding, plus numerous indicators from the CIRI Human Rights Dataset, Polity IV Dataset, World Bank, OECD, Correlates of War project, and Fearon and Laitin datasets. Here is our model's forecast for 2010 - 2014 as a ranked list:

  1. Iran
  2. Sri Lanka
  3. Russia
  4. Georgia
  5. Israel
  6. Turkey
  7. Burundi
  8. Chad
  9. Honduras
  10. Czech Republic
  11. China
  12. Italy
  13. Colombia
  14. Ukraine
  15. Indonesia
  16. Malaysia
  17. Jordan
  18. Mexico
  19. Kenya
  20. South Africa
  21. Ireland
  22. Peru
  23. Chile
  24. Armenia
  25. Tunisia
  26. Democratic Republic of the Congo
  27. Belarus
  28. Argentina
  29. Albania
  30. Ecuador
  31. Sudan
  32. Austria
  33. Nigeria
  34. Syria
  35. Kyrgyz Republic
  36. Egypt
  37. Belgium

Using a regression model applied to a large number of drivers of conflict variables spanning numerous open source social science datasets, our model uses a novel Negative Residuals technique. Negative Residuals result from the model predicting higher levels of violence than actually experienced, indicating nation states that are pre-disposed to increasing levels of violence based on the presence of environmental conditions and drivers of conflict with demonstrated correlation with measured political violence. The residuals imply that these are states that we expect to observe increases in violence although not necessarily high levels of violence. So Iran and Sri Lanka are not expected to have the same level of violence but are expected to have the same magnitude increase in violence.

There some unexpected countries on our list like Czech Republic and Italy. Time will tell the accuracy of our model's predictions although recent political violence in Ecuador is an early indicator of the model's effective performance. The model uses nuanced measures of repression and captures variables that can be manipulated by policy makers. Our project page has further details on the model.