OECD Life Satisfaction Index - What makes people happy?

Recently, OECD published its life satisfaction index across its member countries. How's Life?: Measuring Well Being gives an overview of the OECD Better Life Index methodology, which is focused on households and individuals rather than aggregate economic conditions, and on well-being outcomes as opposed to well-being drivers. In particular, life satisfaction measures how people rate their general satisfaction with life on a scale from 0 to 10. The surveys show that Hungary, Portugal, Russia, Turkey and Greece have a relatively low level of overall life satisfaction while  Denmark, Norway, the Netherlands and Switzerland have a high level of overall life satisfaction. 201205242132.jpg

In addition to life satisfaction, OECD survey also captures various indicators under the topics of housing, income, jobs, community, education, environment, civic engagement, health, safety, and work-life balance. For instance, the housing topic includes the rooms per person (average number of rooms shared by person in a dwelling), dwellings with basic facilities (percentage of people with indoor flushing toilets), and housing expenditure (housing expenditure as a percentage of disposable household income) indicators. In contrast, the jobs topic includes the employment rate (percentage of people currently employed in a paid job), long term unemployment rate (percentage of unemployed people who have been actively looking for a job for over year), personal earnings (average annual earnings for a full-time employee), and job security (share of employment with a tenure less than 6 months) indicators. 201205242205.jpg The chart on the left shows the indicator values for Hungary. In general, Hungarians are less satisfied with their lives than the OECD average. 65% of Hungarians have more positive experiences (feelings of rest, pride in accomplishment, enjoyment, etc) than negative ones (pain, worry, sadness, boredom, etc) on a daily basis in contrast to the OECD average of 72%.

As the chart shows Hungarians feel safe, are happy with the water and air quality, find their work-life balance acceptable, have a strong sense of community, and are happy with the general quality of the education. For instance, 89% of Hungarians believe that they know someone they could rely on in time of need. Yet in life satisfaction, Hungary scores the lowest.

Referring back to the chart, the scores for housing, jobs, civic engagement, and health for Hungary are lower than the OECD average while income is considerably lower than the OECD average. In Hungary, the average person earns about $13K a year, less than the OECD average of $22K a year. It seems like the low life satisfaction score for Hungary is connected to low living standards stemming from sub-par income levels coupled with a lack of jobs. For instance,  around 55% of Hungarians aged 15 to 64 have a paid job, well below the OECD employment average of 66%.

In contrast, sense of community is the lowest score for Turkey. For instance, 69% of Turks believe that they know someone they could rely on in time of need, lower than the OECD average of 91%. Is there a correlation between life satisfaction and indicators for living conditions and quality of life? If yes, does the correlation hold across the OECD countries?

201205242219.jpgThe OECD Web site has a mixer tool that lets a user to select the relative rankings of the indicators and analyze the ranked list of countries based on these preferences. The customized index enables the comparison of well-being across countries based on personal preference of the importance of 11 topics the OECD has identified as essential, in the areas of material living conditions and quality of life.

While the OECD mixer is a nice tool for engaging readers, as a modeler, we see the life satisfaction sentiment indicator as an output while the rest of the indicators (housing, income, jobs, etc.) as inputs. In other words, we believe that the input indicators drive people how people feel about their life experiences. To test this hypothesis, we performed a correlation analysis between the life satisfaction index and the rest of the indicators in order to understand which factors contribute the most or the least to the life satisfaction sentiment. The correlation analysis is shown on the right.

In terms of topics, life satisfaction has high correlation with income, jobs, housing, health, low correlation with education and no correlation with safety. In terms of individual indicators, room per person has the highest correlation with life satisfaction while job security, housing expenditure, employees working long hours, educational attainment, years in education, student skills, consultation on rule making, air pollution, homicide and assault rate indicators have very low correlation with life satisfaction.

The indicators under each topic show some interesting results. For the jobs topic, while employment rate, personal earnings, long term unemployment rate indicators are correlated with life satisfaction whereas job security is not. Similarly, for the environment topic, while water quality has a high correlation with life satisfaction, air pollution does not.

It would be interesting of comparison if there was a similar survey for non-OECD countries. Perhaps the OECD country values are dominated by the population's desire for the ability to collect as many material possessions as possible. Relatively poorer country values may not follow this correlation.

Countries with Increased Political Violence Forecast: 2011 - 2015

In our companion Domestic Political Violence Model blog, we published yesterday the list of countries predicted to have increases in political violence for 2011 to 2015. The map below shows the countries with expected increase in political violence grouped by Very High Risk, High Risk, and Medium Risk. Our forecast is based on four different models. In the Very High Risk category, all four models predicted an increase. In the High Risk category, three models predicted an increase. In the Medium Risk category, half of the models predict an increase in violence. The countries in each category are sorted based on the size of the mean residual, so the states with the most pent-up demand for violence are listed at the top. The residuals imply that these are states that we expect to observe increases in violence although not necessarily high levels of violence. So United Kingdom and Israel are not expected to have the same level of violence but are expected to have the same magnitude increase in political violence.

201112311034.jpg

United Kingdom, Israel, Sri Lanka, Iran, Colombia, Zimbabwe, South Africa, Haiti, Egypt, Philippines, Guinea-Bissau, Venezuela, Chile, Syria, Chad, Belarus, Guinea, Kyrgyzstan, Greece make up the very high risk list. Israel, Sri Lanka, Iran, Colombia, South Africa, Egypt, Chile, Syria, Chad, Belarus, and Kyrgyzstan are returning countries from our 2010-2014 forecast. Of our 2010-2014 forecast, Syria, Egypt, and Libya saw the most violent protest in the Arab Spring of 2011. United Kingdom, Zimbabwe, Haiti, Philippines, Guinea-Bissau, Venezuela, Guinea, and Greece are the new additions to our very high risk list. United Kingdom tops the list as the pent-up demand for increased violence was certainly evident in the London Riots over the Summer of 2011. Greece saw substantial increase in political violence due to the measures introduced by the Greek government to address the debt crisis.

It is worth noting that our 2011-2015 forecast model is based on events dataset which captures both the frequency and the intensity of political violence from 1990 to 2010. Similarly, our 2010-2014 forecast model is based on events dataset which captures both the frequency and the intensity of political violence from 1990 to 2009. We publish our forecast based on our acquisition date of the event dataset. As the event dataset is available on a real-time basis - albeit at a higher cost, we can publish our forecast in real-time if needed.

Using a regression model applied to a large number of drivers of conflict variables spanning numerous open source social science datasets, our model uses a novel Negative Residuals technique. Negative Residuals result from the model predicting higher levels of violence than actually experienced, indicating nation states that are pre-disposed to increasing levels of violence based on the presence of environmental conditions and drivers of conflict with demonstrated correlation with measured political violence. In our model, the magnitude of future political violence directed towards the state is heightened by coercion, often thought of as violations of physical integrity rights, and by coordination, or the tools by which groups can associate and organize against the state. Conversely, the magnitude of political violence is lessened by capacity, defined as the ability of the state to project itself throughout its territory.

For the event dataset, we use the Integrated Data for Event Analysis (IDEA) framework. IDEA event dataset is based on the Reuters Global News Service, and organized in a “who” did “what” to “whom” manner for each particular event. This framework allows researchers to isolate events of interest for their particular project. Using this framework allows us to capture and isolate domestic anti-government violence. For the dependent variable, our model uses the Goldstein scores that captures the overall level and intensity of domestic antigovernment violence within a state in a given year.

Wiki Surveys for Social Science Research

Surveys and interviews form the central methodology for analyzing and discovering attitudes and opinions in social science research. With the advent of Web, online surveys have become an efficient way for researchers to collect and analyze large amounts of data. The popularity of the online survey tools like SurveyMonkey , Zoomerang, SurveyGizmo , etc. are testament to the productivity enabled by surveys. However, surveys represent a top-down rigid methodology forcing the survey designer to account for all possible answers up front, which is an impossible feat. In contrast, interviews allow the unanticipated information to bubble up bottoms up from the respondents. For instance, Integrity Watch Afghanistan (IWA), Afghan Perceptions and Experiences with Corruption: A National Survey 2010 primary data, involves interviewing randomly selected 6,500 respondents in 32 provinces on over 100 questions that deal with sectors where people experienced corruption; levels of bribes people paid to obtain services; what type of access people had to essential services; who people trusted to combat corruption; and experiences with corruption in the judiciary, police, and land management. However, the interview methodology is expensive and time-consuming as it requires implementation by research companies with expertise in effective research design, and precise management of data collection over several months.

Is there an alternative to surveys and interviews in social science research? Prof. Salganik's team at Princeton came up with a hybrid approach, "wiki surveys", that combines the structure of a survey with the open-endedness of an interview. To date, various organizations have created more than 1000 wiki surveys on the project Web site - All Our Ideas, generating in 45,000 ideas with 2 million votes. Wiki surveys range from the New York City Mayor's Office's engagement with citizens in shaping the city’s long term sustainability plan to the Catholic Relief Services surveying their 4000 employees to find out what makes an ideal relief worker. The figure below shows how the third question in Tactical Conflict Assessment Planning Framework (TCAPF) would be be implemented as a wiki survey:

tcapf wiki survey.jpg

Inspired by extending the kittenwar concept to ideas, the user interface guides the respondent to choose between two random alternatives, while encouraging the respondents to add their ideas into the mix of alternative responses. The additional ideas are added into the survey’s marketplace and voted up or down by the other survey-takers. Prof. Salganik says that “One of the patterns we see consistently is that ideas that are uploaded by users sometimes score better than the best ideas that started it off. Because no matter how hard you try, there are just ideas out there that you don’t know.”

All Our ideas have some basic visualization features to make sense of the wiki survey responses. Here is the visualization for the responses - "What do you think the Digital Public Library of America (DPLA) should be like?":

DPLA Survey Reponse.jpg

It is worth noting that the top scoring 15 ideas starting with DPLA interoperability with Government Printing Office (GPO), Defense Technical Information Center (DTIC), an National Records Archive Administration (NARA) are all uploaded ideas not in the original set of alternatives. A powerful argument for crowd sourcing!

Admittedly, we still need boots on the ground to collect TCAPF data in Afghanistan given the demographics of the people we want to reach. On the other hand, wiki surveys hold great potential in reaching the younger generation fueling the Arab spring and the like.

Bin Laden Hideout vs. Al Qaeda Training Manual

In our Building Intent project, we developed a geoprofiling algorithm that predicts the location of facilities that support adversary operations in the urban environment. Geoprofiling is a technique that is widely used in serial crime investigations. In our project, we researched and developed a building intent inference system based on terrorist preferences, building characteristics, and social network behavior. Our approach learns the utility function that the adversaries are using, and classifies and predicts the potential utility of a facility to the adversaries based on the derived metadata of each facility using influence networks. For terrorist preferences, we have studied Military Studies in the Jihad Against the Tyrants: The Al-Qaeda Training Manual in order to find building use tactics that the adversary is training its recruits, and found a significant number of building use related tactics and procedures embodied in these manuals. In collaboration with the Terrorism Research Center in Fulbright College, University of Arkansas, we then studied the international terrorism cases in the American Terrorism Study, and found empirical evidence that shows the practice of terrorism manual tactics in the observed data. Based on these findings, we developed a baseline set of indicators for modeling building intent, and researched the likely causal connections among these variables. We then built extractors to derive a set of metadata for these indicators, and used machine learning algorithms to find the causal connections between the incidents or events and building attributes, and model parameters, and build classifiers based terrorist process preferences , building characteristics, and guilt by association data.

As shown in the figure below, our geoprofiling algorithm does a nice job in predicting the Japanese Red Army terrorist Yu Kikumura's residence in New York based on the American Terrorism study. Here the blue markers signify police stations and white arrows signify the egress points. As shown in the figure, Yu Kikumura's residence at 327 East 34th Street, NY is in the red hotspot area predicted by our algorithm. Avoiding police stations and ease of egress were two of the primary factors in Kikumura’s choice of housing. Not only is his apartment equidistant from the nearest police departments – all of which are over one kilometer away – it’s back-alley access road to the underground Queens Memorial Tunnel provides a quick get-away by car. In addition, the examination of the residence floor plan reveals that the apartment building had numerous staircases (one of which is private to the unit) to the basement level with a rear exit.

Japanese Red Army.png

The Al Qaeda Training Manual gives several instructions for renting a residence as shown in the table below. For instance, it is preferable to rent apartments on the first floor for ease of egress, avoid apartments near police stations and government buildings, and in isolated or deserted locations, rent in newly developed areas, and the like. In particular, the Al Qaeda Training Manual calls for the use if the following tactics in renting an apartment:

Al Qaeda Tactics.jpg

So how does the location of Bin Laden's secret hideout in in Abbottabad follow the advice of the Al Qaeda Training Manual? Not that closely. Bin Laden clearly did not follow the tactics for selecting a ground floor location by living on the third floor, for avoiding police stations and government buildings by selecting a location near the Pakistan Military Academy, for finding an apartment in newly developed areas where people do not know each other by choosing a neighborhood with retired Army Generals, and for preparing ways of vacating the premises in case of a surprise attack by not building exit stairs. The only tactic that Bin Laden has used from the list above is avoiding an isolated location. One wonders if Bin Laden made a concerted effort to avoid his own tactical advice in order to thwart geoprofiling techniques. Perhaps another consideration that will need to be taken into account in future geoprofiling is the assistance from outside forces, given the possible connection to a support network that included elements of the Pakistani military or intelligence services in the Abottabad area.

Increased political violence in store for Italy and Czech Republic?

In collaboration with our academic partners Prof. Cingranelli at the Political Science Department, SUNY Binghamton University and Profs. Sam Bell and Amanda Murdie at the Department of Political Science, Kansas State University, we developed a Domestic Political Violence Model that forecasts political violence levels five years into the future. The model enables policymakers, particularly in the COCOMs, to proactively plan for instances of increased domestic political violence, with implications for resource allocation and intelligence asset assignment. Our model uses the IDEA dataset for political event coding, plus numerous indicators from the CIRI Human Rights Dataset, Polity IV Dataset, World Bank, OECD, Correlates of War project, and Fearon and Laitin datasets. Here is our model's forecast for 2010 - 2014 as a ranked list:

  1. Iran
  2. Sri Lanka
  3. Russia
  4. Georgia
  5. Israel
  6. Turkey
  7. Burundi
  8. Chad
  9. Honduras
  10. Czech Republic
  11. China
  12. Italy
  13. Colombia
  14. Ukraine
  15. Indonesia
  16. Malaysia
  17. Jordan
  18. Mexico
  19. Kenya
  20. South Africa
  21. Ireland
  22. Peru
  23. Chile
  24. Armenia
  25. Tunisia
  26. Democratic Republic of the Congo
  27. Belarus
  28. Argentina
  29. Albania
  30. Ecuador
  31. Sudan
  32. Austria
  33. Nigeria
  34. Syria
  35. Kyrgyz Republic
  36. Egypt
  37. Belgium

Using a regression model applied to a large number of drivers of conflict variables spanning numerous open source social science datasets, our model uses a novel Negative Residuals technique. Negative Residuals result from the model predicting higher levels of violence than actually experienced, indicating nation states that are pre-disposed to increasing levels of violence based on the presence of environmental conditions and drivers of conflict with demonstrated correlation with measured political violence. The residuals imply that these are states that we expect to observe increases in violence although not necessarily high levels of violence. So Iran and Sri Lanka are not expected to have the same level of violence but are expected to have the same magnitude increase in violence.

There some unexpected countries on our list like Czech Republic and Italy. Time will tell the accuracy of our model's predictions although recent political violence in Ecuador is an early indicator of the model's effective performance. The model uses nuanced measures of repression and captures variables that can be manipulated by policy makers. Our project page has further details on the model.

Multi-Criteria Decision Modeling for Complex Operations

Next week we will be presenting a paper at the International Conference on Cross-Cultural Decision Making in Miami, Florida. I am looking forward to participating in a highly informative and interesting session, bridging modeling and simulation disciplines with socio-cultural data for military operations. In our paper entitled “Geospatial Campaign Management for Complex Operations”, we report initial findings from a research effort to understand the complexity of modern day insurgencies and the effects of counterinsurgency measures, integrating data-driven models, such as Bayesian belief networks, and goal-driven models, including multi-criteria decision analysis (MCDA), into a geospatial modeling environment in support of decision making for campaign management. Our Decision Modeler tool instantiates MCDA, a discipline for solving complex problems that involve a set of alternatives evaluated on the basis of various metrics. MCDA breaks a problem down into a goal or set of goals, objectives that need to be met to achieve that goal, factors that effect those objectives, and the metrics used to evaluate the factor. Since the selection of metrics for specified objectives and data for computing metrics are the biggest hurdles in using MCDA in practice, both the metrics and associated data are part of our tool's library for user reuse. Below is an image of the MCDA structure. Click on any of the images in the post to see more detail. Our decision modeling tool also incorporates a weighting system that enables analysts to apply their preferences to the metrics that are most critical for the mission. Linking these decision models in a shared space within the tool creates a repository of knowledge about progress along lines of effort in an operation, providing a source for knowledge transfer for units rotating into and out of the theater. The alternatives considered in the decision model are different courses of action that can be evaluated against metrics to determine the optimal action for accomplishing the commander’s goals. Of course, working in a complex human system such as the one found in counterinsurgency and stability operation environments, our tool is not meant to be a ‘black box’ model that simply reports to the user what to do, but rather the decision analysis provides insight through both qualitative and data-driven models about what courses of action will set the conditions for a more successful outcome based on the commander’s intent.

In evaluating our tool with users, we determined that one of the most important features involves the visualization of the tradeoffs for various courses of action in the decision model. To address this, we compute the uncertainty of data based on its distribution and propagate its effect analytically into the decision space, presenting it visually to the commander. A greater dispersion represents more uncertainty, while a clustered set of data points indicates more certainty regarding the cost and effectiveness metrics for a particular course of action. In this way, we are able to represent the high levels of uncertainty inherent in socio-cultural information without negatively impacting the ability of our tool to calculate a decision model. By incorporating a visual representation of uncertainty in the model, scenarios can then be played out to determine optimization for various courses of action based on data inputs and user preferences, translating model outputs into a form that can more readily be used by military users.

To demonstrate an example of how the visualization of uncertainty would work in the tool, in the image below we have analyzed two potential courses of action relating to the essential services line of effort with the objective of supporting healthcare initiatives in an area of operations. In this case, we are deciding where to focus our efforts, comparing two districts, Arghandab and Anar Dara in Southern Afghanistan. Here we are only examining a few potential metrics: the cost of building healthcare centers proposed by local development councils; the number of basic healthcare centers already in the district; and the number of people that identified a lack of healthcare as the major problem facing their village, a question that is collected in the Tactical Conflict and Assessment Planning Framework (TCAPF) data. Our MCDA tool would compute and display the effectiveness versus costs data points from metrics corresponding to the two proposed courses of action. We want to determine which district would optimize our goal of restoring essential services with the objective of supporting healthcare initiatives by leveraging the data inputs. In considering the uncertainty, we have represented the distribution in the ellipsoid around the data point. This allows a military planner to visually analyze and evaluate the potential courses of action based on cost versus effectiveness metrics, while accounting for the uncertainty of the data. In addition, the weighting system, sliders shown on the right hand of the image, allows a military planner to experiment to determine how a change in metrics will affect the proposed courses of action.

One of the key benefits of our approach is that it allows for real-time knowledge generation. By updating the model with new data the Decision Modeler will re-evaluate the outlined courses of action against the new information, allowing the user to view trends over time in the effectiveness and cost metrics for particular courses of action. In the example below, perhaps the cost estimates went up for the proposed course of action in Anar Dara given deterioration in the security situation that affected the ability of hiring contractors to execute the project. In Arghandab, the metric could have changed according to our collection of TCAPF data, emphasizing that more people responded that healthcare is the major problem facing their village, therefore, increasing the effectiveness against our objective if we built a healthcare center there. Given the increased need, the villagers have offered to provide labor at decreased cost and will contribute a certain percentage of funds to the project, therefore representing the decreased costs associated with Arghandab data points. In this way the tool will provide course of action forecasting based on an analysis of data for the purposes of proactively planning operations that optimize the commander’s objectives.

We will be presenting a more detailed analysis of our research results at the conference, so keep an eye out for links to our papers and presentation.

Tribal Human Terrain of Afghanistan

Under the sponsorship of the OSD Human Social Culture Behavior (HSCB) program, we are developing a semantic wiki for Complex Operations. The envisioned operational impact of our effort is to foster collaboration and sharing of knowledge for whole-of-government approach, and to improve COIN/SSTR operations analysis and execution by focusing on population as center of gravity. The development of such a wiki presents several challenges that include the broad domain area of knowledge complex operations require, a large number of doctrine publications to wikify and semantify, several out of print key references, etc. With these challenges, we saw an opportunity to develop an open source culturepedia for Afghan and Pakistan human terrain as such knowledge is not aggregated and not readily available.

The Complex Operations wiki currently contains more than 1,000 articles on the various tribal dynamics and locational knowledge for the Afghanistan and Pakistan region, outlining tribal meta-knowledge such as the sub-groups, primary locations, traditional alliances, and traditional disputes of various groups to support situational awareness about the human terrain. Here is the wiki page for the covered Afghanistan Organizational Groups. We have created over 150 concept maps (an example shown below) to capture the knowledge about 1,000 ethnic groups, tribes, sub-tribes, clans within Afghanistan and Pakistan region to make this human terrain knowledge readily accessible to the complex operations practitioner.

tribal concept map.png

Our use of a semantic wiki platform enables the representation of the human terrain knowledge as facts and relationships. For instance, the wiki page for the Achakzai tribal group lists the the known facts and relationships about this ethnic group both a human consumable form using semantic forms:

Achakzai Semantic Form.tiff

, and a machine consumable form as semantic RDF relationships:

Achakzai RDF.tiff

By inspecting the semantic form, the reader can deduce that Achakzai is a sub-tribe of Zirak, which is a sub-tribe of the Durrani super-tribe, primarily located in the Chora and Khas Uruzgan districts, and traditionally have disputes with the Nurzai, Panjpai and Kakar tribes. The representation of this knowledge in a semantic wiki has the additional advantage for faceted browsing and answers engine queries. For instance, the semantic wiki can answer questions like "What are the tribes in Kandahar Province and their traditional disputes?" as a table which gets automatically updated every time a new tribe in this province is added to the wiki: Tribes in Kandahar.tiff There are also several groups in Afghanistan that do not organize around tribal kinship ties, including Uzbeks, Tajiks, and Hazaras. In addition to tribal affiliation, social organizations such as solidarity groups - a group of people that acts as a single unit and organizes on the basis of some shared identity, and patronage networks - led by local warlord or khan - play an important role in understanding of the human terrain. Afghan and Pakistan human terrain and situational awareness knowledge base can be extended to include other populations of interest to the community, such as Yemen or Somalia.

Shuffling methodology for sanitizing Afghanistan TCAPF microdata: a working paper

Sometime back in February 2010 I started a working paper titled "Shuffling_Methodology_for_Sanitizing_TCAPF_Microdata" (click to download as PDF) which outlined the methodology I used for data sanitization of TCAPF data.  The sanitization approach I discuss is applicable to cases where its desired to share unclassified data while preserving the privacy (and operational security) inherent in the data.

Essentially the data which was shared with us by USAID, although it was unclassified it had distribution restrictions due to the sensitive nature of the data which was collected by 24th MEU and other units in Afghanistan.  We felt compelled to publish the results from a bayesian analysis we performed on the data and thought it best to sanitize the data first and then publish the results from the cleansed data.  In order to do so, we had to maintain the analytical value of the data by preserving the distributional properties of the dataset for the results obtained to remain valid.  We had to balance this need for preserving analytical value with the privacy needs to withhold or obfuscate data fields deemed too sensitive to disclose.

The discussion in the paper where I go through a thought process of what could go wrong should get you thinking, at least.  I welcome your feedback and ideas in the comments below.

Socio-Cultural Modeling & Analysis at NPS

Naval Postgraduate School in Monterey is one of our government's educational jewels. Nestled in the beautiful landscape of the Monterey Peninsula, this institution brings togethers a diverse group of educators, researchers and student practitioners to promote a vigorous debate of the issues facing our national defense, and the advancement of solutions addressing these issues. Last week I had the pleasure of giving a couple of talks and participating in a panel discussion at NPS. Here is a quick rundown.

The first day, I was the invited speaker for a panel discussion on Socio-Cultural Modeling & Analysis. This panel discussion explored the problem of modeling and analysis to provide insights to decision makers on complex socio-cultural issues from the perspective of both social scientists and computational modelers. The panel discussion addressed the questions:

  • How does the inherent variability within humans impact the ability to draw insights from modeling and analysis?
  • What strategies can be used to address the challenges of modeling and analysis in the human domain?

My presentation sparked some interesting questions like how can we convince the Commander to help with data collection when the Commander sees no immediate return on the invested overhead. I suggested that DoD can replicate what consulting companies do: Put a resource who has no execution task other than recording knowledge in project executions. Panel discussion generated a lively debate between social vs. computational scientists. One of the computational scientists on the panel said that everyone wants to solve "easy to model" instead of "hard to model", which is what the decision maker is interested in. For instance, coloring the map of failing states using the Political Instability Task Force (PITF ) or our Predictive Societal Indicators of Radicalism (PSIR) models provides hardly any new insight to the General in charge.

Another criticism was the publishing delay in social science data sets (e.g. CIRI, MAR, Uppsala, etc.). For instance, human rights data set publishers wait for the State Department and Amnesty International to publish their annual reports for the previous year in spring this year. Then they take a couple of more months to code the reported incidents and publish. Such a delay does not exactly match DoD operations focusing on the current. I advocated the need for publishing real-time social science indicators that can be adjusted later like the government's GDP revisions six months later.

Social scientists on the panel stressed the importance of representing qualitative in addition to quantitative knowledge in these models. For instance, socio-cultural responses to color can be significant as the color red represents celebration in Chinese, purity in Indian and danger in Western cultures. This kind of knowledge is certainly relevant in SSTR operations. Dr. Guttieri cautioned against the public perception of manipulation using socio-cultural models citing Project Camelot.

It was nice to see the articulation of the healthy tension between the social and computational scientists in the audience. In closing, I advocated packaging of social science for tactical operations where warfighters are serving as or advising governors, town managers, mayors - jobs that they were not trained for.

The second day, I gave a brown bag seminar at the NPS Cebrowski Institute on our Semantic Wiki for Complex Operations project. This project aims to address the gaps in current solutions supporting COIN/SSTR operations:

  • Document-centric repositories makes seeking answers time and effort intensive
  • Disparate knowledge “silos” makes situational awareness hard for complex contingency operations requiring interagency cooperation

Semantic wikis enable community-powered structured knowledge production using semantic forms, faceted browsing of structured content, powering answer engines and linking different data sets. There was significant interest in using our semantic wiki for teaching as such an approach can significantly increase the amount of learned knowledge NPS students take to the field of practice, and provide an effective reach back capability from the field.

I visited TRAC-Monterey, which has a number of interesting projects. In particular, I found the Cultural Geography project interesting as an agent application. This project started as Urban Cultural Geography for Stability Operations. The Cultural Geography model employs issue based segmentation of the social network of leaders, followers using communication theory and weapons of influence concepts to predict the future based on population identity groups. The mind of the agent is a belief network that develops actions based on the beliefs, values, interests of the associated identity group. COIN IPB and Center of Gravity (COG) is the target result.

I also paid a visit to Defense Resources Management Institute (DRMI) at NPS. Here I found the Multi Criteria Decision Making (MCDM) course of particular interest as it relates to the SSTR Campaign Planner tool we are developing in our PSIR project. DRMI teaches the MCDM course as a 2-day, 2-week, 4-week and quarter formats to a wide audience from DoD, DHS, Emergency Response Teams. MCDM is widely used as a decision-aid tool for ranking decision alternatives. DRMI course emphasizes visualization of the decision space instead of ranking alternatives by scores. Such an approach enables the user to detect conflicting criteria, cluster alternatives, eliminate undesirable alternatives, and select the optimal alternative.

Kneber Botnet - less fluxy but more stealthy

The recent news story about the Kneber botnet based on the excellent work done by the NetWitness team and informative posts by Dancho Danchev and others brought the ZeuS Trojan botnet into limelight. In contrast to some misleading reports, the security community has been following this botnet, which infected more than 75,000 computer systems at nearly 2,500 companies, for quite a long time. We have been tracking ZeuS with our Fast Flux Monitor for some time as well. Given the recent interest in this botnet, we decided to analyze the reported ZeuS data using our Fast Flux Monitor database to provide some additional insight. Most of the domain, nameserver and IP entities associated with the attacking infrastructure reported in the NetWitness Kneber report have been in our FastFluxMonitor database. What is interesting is that most of the reported Kneber domains and nameservers are not exhibiting fast flux behavior. For instance, all of the reported Kneber domains for the Trojan installers resolve to 1 to 4 IPs, which is not enough for using a fast flux evasion scheme. The number of domains the Kneber Trojan installers resolve to are shown in the table below.

ZeuS Installer.jpg

Comparing the ZeuS network graph with the various botnets in our database reveals that ZeuS botnet has a different network graph than others like Avalanche, Conficker, Gumblar and Pushdo. The figure below shows the domain, nameserver and IP connectivity for the Avalanche botnet:

ffm_avalanche_network.jpg

In this graph, the blue, red, green nodes denote the IPs, domains, and nameservers addresses, respectively. Each cluster represents a set of entities where any two nodes can be linked through the domain, nameserver and IP connectivity . The Avalanche graph has one large cluster and six small clusters, making it easy to discover the various entities of this botnet. In contrast, the same graph for the ZeuS botnet shown below has one large cluster and over 200 small clusters, thus making it hard to discover the various entities of this botnet.

ffm_zeus_network jpeg.jpg

Referring to the data shown in the table above, the reported Kneber domains and nameservers belong to one of the small clusters on the right. These clusters consist of domains and nameservers that do not exhibit fast flux behavior. Whether the small clusters represent the discreet probe of networks by large criminal organizations, or small operator hosting set-ups that downloaded free phishing kits, the ZeuS botnet is stealthier than the others by relying on a large number of smaller clusters used for attack campaigns.

We will present our comparative analysis of the Avalanche, Conficker, Gumblar, Pusdhdo, and ZeuS at the NATO IST-091 Symposium on "Information Assurance and Cyber Defence", which will provide an explanation for the difference.

Concept Map vs. PowerPoint for Briefings

What has PowerPoint given the knowledge worker besides universality? PowerPoint features like automatic generation of slides from outlines, structured knowledge constructs like tables, graphs, and charts support knowledge organization and communication. Although most PowerPoint features have been available since the early days of Mac software such as MORE and Cricket Graph, it is the ubiquity of PowerPoint that created a backlash against the uniformity it imposes on thinking, organizing , sharing of knowledge concepts. In his essay The Cognitive Style of Powerpoint: Pitching Out Corrupts Within, Edward Tufte argues that PowerPoint templates weaken verbal and spatial reasoning, and corrupt statistical analysis by analyzing the NASA briefings preceding the Columbia disaster.

In the government space we serve, the situation is not different. As reported in the Wall Street Journal, PowerPoint has become an ingrained part of the defense culture. For instance, PowerPoint Ranger is now a derogatory term used for a military professional who excels in slidemaking than warfighting. In fact, Margaret Hayes at the National Defense University posits that "You can't speak with the U.S. military without knowing PowerPoint." In the Armed Forces Journal essay, Dumb-dumb Bullets, T. X. Hammes goes further to argue that PowerPoint is "actively hostile to thoughtful decision-making", and has "decreased the quality of the information provided to the decision-maker".

From a cognition perspective, would you ask a first grader to build a PowerPoint presentation to see their grasp of a concept? No. Luckily for us, thanks to the pioneering work of Joseph D. Novak at Cornell and others, there is something that educators are using for such assignments in K-12 and higher education: Concept Maps. A concept map is a graphical network diagram where each node represents a concept, and the labelled links depicts the relationships between concepts. Here is a concept map that describes what a concept map is.

CMap of Concept Maps.tiff

What is special about this representation? The teacher sees the limitation of the student's understanding, and multiple students can collaboratively build a concept map for shared understanding. Is that possible in PowerPoint? No. It is simply not possible to assess an author's level of understanding of the subject domain from a PowerPoint deck as lack of communication skills often masks the knowledge gaps in the underlying domain.

As articulated by Joseph D. Novak, meaningful learning involves the assimilation of new concepts and propositions into existing cognitive structures. What this means that the viewer needs to first identify her/his known cognitive map of the presented concept and then detect the additions to this concept map for true learning. In other words, the viewer of a presentation always tries to find the answers to the following questions:

  • What do I already know in the presented topic?
  • What are the additional knowledge chunks that complement what I already know?
  • Can I trust the presented addition to my knowledge base?

On answering these questions, concept maps trump PowerPoint presentations, which explains their popularity in learning environments. Concept mapping is not a religion espoused by some education crusaders as the effectiveness of concept maps has been studied empirically. In an experiment conducted at the Naval Postgraduate School, Concept Maps were empirically demonstrated to be more effective than PowerPoint on key measures of knowledge transfer and rapidity in creation. In an anonymous survey at the American University in Cairo, a majority of students stated that doing concept maps required them to look at the assigned reading in more depth. A study conducted at a nursing school in Bangkok, Thailand showed that concept mapping is effective in assisting nursing students to summarize their own concepts and improve their nursing core competency in primary medical care.

There are other advantages of using Concept Maps in presentations. Jim Benson in his blog post makes the interesting point that concept maps create a continuous conversational flow with no breaks while noting that PowerPoint creates an unhealthy distraction of "What's coming next?". Steven Kaminski writes that most business PowerPoint presentations, with a little extra work, would be better—even much better—without it because the speaker becomes an audio aid for the PowerPoint slides instead of the presentation being a visual aid for the speaker, which is the case for concept maps.

In addition to several commercial software packages, there are several open source concept mapping tools. The Visual Understanding Environment (VUE) project at Tufts, and CMap Tools at the Institute of Human and Machine Cognition (IHMC) have large active community of users. We don't have to wait until Concept Maps become a part of the Microsoft Office suite to start using them. Do we?

Semantic Annotation for Knowledge Management

Do you remember your annotation home-works from literature courses? Researching the qualifications of an author, figuring out the topic, tone, rhetorical strategy, audience, and purpose of an essay, or thinking about the connections between what you just read and other work in the field ... Imagine everyone being able to put such facts and relations into a machine understandable form and having machines harvest those relations on our behalf. That is what semantic wikis enable.

Semantic annotation defines the domain concepts and relations between concepts. Formally, an annotation is a tuple consisting of annotation (subject, object, relation, and context) as defined in "Annotation and Navigation in Semantic Wikis" by Eyal Oren et al. Our Semantic Wiki for Complex Operations uses Semantic MediaWiki, that allows annotations to a wiki page. For instance the insurgency page has the following annotation:

* [[has characteristic::Popular Support]]

Here the subject of the annotation is the 'insurgency' concept represented by this wiki page, 'has characteristic' is the annotation relation, and 'popular support' is the object of the annotation. While Semantic MediaWiki only allows single level annotations of wiki pages and does not formally separate the page and the concept it represents, we think it still serves as the widely adopted standardized semantics syntax necessary for semantic wiki applications to take off.

Annotation clearly introduces an additional burden on the knowledge worker. So unless the return on investment on semantic annotation provides value to the community that the wiki serves, it would be hard to expect widespread adoption. Semantic MediaWiki extensions provide such value. Again referring back to our Semantic Wiki for Complex Operations, the wiki page for each social science data set (e.g. Minorities at Risk Project Dataset, CIRI Human Rights Data Project, etc.) is annotated by using the built-in 'category' attribute:

Category: Dataset

That is, each social science dataset in our Complex Operations wiki is annotated to be of category dataset. Clicking on the link Dataset above gives a table that lists currently avaiable social science datasets in our wiki:


Table.tiff

In a traditional wiki, this table needs to be manually specified by:


wiki syntax.tiff

In contrast, semantic annotation enables us to generate this table dynamically using only one statement in Semantic MediaWiki:

{{ #ask: [[Category:Dataset]]

| ?title

| ?year }}

As this example illustrates, semantic annotation provides a significant ROI to the knowledge worker in knowledge organization. Moreover, when a researcher adds another dataset to our wiki, this table will automatically include the new dataset, thus improving knowledge maintenance.

Semantic Wikis for Communities of Practice

The term community of practice (CoP) was coined by Jean Lave, a social anthropologist. Its value in learning was popularized by Etienne Wenger, an educational theorist. CoP denotes a group of people who share a passion about a common topic, and deepen their knowledge and expertise in this domain by interacting with each other on an ongoing basis. According to Etienne Wenger, a community of practice defines itself along three dimensions and its characteristics can be captured by:

The domain. A community of practice is is something more than a social network. "It has an identity defined by a shared domain of interest. Membership therefore implies a commitment to the domain, and therefore a shared competence that distinguishes members from other people".

The community. "In pursuing their interest in their domain, members engage in joint activities and discussions, help each other, and share information. They build relationships that enable them to learn from each other".

The practice. "Members of a community of practice are practitioners. They develop a shared repertoire of resources: experiences, stories, tools, ways of addressing recurring problems—in short a shared practice. This takes time and sustained interaction".

In developing and nurturing Communities of Practice, Etienne Wenger talks about the diverse and distributed internal leadership:
• The inspirational leadership provided by thought leaders and recognized experts
• The day-to-day leadership provided by those who organize activities
• The classificatory leadership provided by those who collect and organize information in order to document practices
• The interpersonal leadership provided by those who weave the community's social fabric
• The boundary leadership provided by those who connect the community to other communities
• The institutional leadership provided by those who maintain links with other organizational constituencies, in particular the official hierarchy
• The cutting-edge leadership provided by those who shepherd "out-of-the-box" initiatives.
McDermott goes further and states learning is in the relationships between people:

Learning traditionally gets measured as on the assumption that it is a possession of individuals that can be found inside their heads… Learning is in the relationships between people. Learning is in the conditions that bring people together and organize a point of contact that allows for particular pieces of information to take on a relevance; without the points of contact, without the system of relevancies, there is not learning, and there is little memory. Learning does not belong to individual persons, but to the various conversations of which they are a part.

In the book Seven Principles for Cultivating Communities of Practice, Etienne Wenger, Richard McDermott, and William M. Snyder argue that while communities of practice develop organically, a carefully crafted design can drive their evolution. Here are the seven principles:
1. Design for evolution
2. Open a dialogue between inside and outside perspectives
3. Invite different levels of participation
4. Develop both public and private community spaces
5. Focus on value
6. Combine familiarity and excitement
7. Create a rhythm for the community
There is additional research on what makes online CoP's flourish. Jennifer Preece posits that etiquette, empathy and trust in communities of practice can be developed by understanding people’s needs; representing the community’s purpose clearly; putting minimalist policies in place that can be changed as norms develop; supporting knowledge creation, exchange and storage; supporting communication and socialization online; encouraging empathy by enabling participants to recognize each other and their similarities; supporting trust by ensuring that identity is revealed and past behavior is tracked.
In the paper Learning with Semantic Wikis, Sebastian Schaffert and his colleagues lists the benefits of semantic wikis in the learning process. First, they argue that semantic annotations lead to reflection about knowledge. For instance, the student needs to reflect on the content while reorganizing the wiki material. In fact, the teacher can assess the student's progress by analyzing the change history. Second, semantic Wikis enable the teacher and students to share formal models, and build of a common model collaboratively. Finally, reasoning and inference capabilities of Semantic Web technologies can lead to discovery of knowledge without active user search. In the paper Using a Semantic Wiki in Communities of Practice, Adil El Ghali and his colleagues articulate the advantages of adding semantics to wikis like semantic search and navigation, a more intuitive interface, intelligent awareness, tagging, folksonomy management, linking CoP content to external resources, etc.

The development of Communities of Practice is the charter of Army Knowledge Online. Here is a paper and related presentation that articulates the thrust in DoD. We are in the process of putting these ideas into practice in our Semantic Wiki for Complex Operations project.

The Inheritance

Perhaps now more than at any other time in our nation's history, the United States faces a multitude of strategic threats and challenges. Rogue regimes, militant Islamist networks, and changing power balances from rising nations such as China, to failing states such as Pakistan, threaten to upend the security and stability of the United States. 

 

As a research assistant for The Inheritance: The World Obama Confronts and the Challenges to American Power, a book by David E. Sanger, Chief Washington Correspondent for The New York Times, I had the opportunity to dive deep into issues ranging from Chinese military modernization to cyber-security to the Iranian nuclear program. My research took me into the Pakistani nuclear establishment and the militant threat emanating from the tribal areas to the post-invasion environment in Afghanistan and the personalities shaping the debate on counterinsurgency in the post-9/11 world. 

 

The democratization of technology involving nuclear materials, cyber-attacks, and biological agents, has provided non-state actors access to weapons that were previously the purview of states. The multifaceted nature of these complex issues will require greater interagency cooperation and knowledge transfer, in particular in the civil-military field. Securing the homeland from the threat of radiological weapons will require a robust intelligence effort abroad to root out shadowy networks dealing in such materials, such as those of A.Q. Khan, increased focus on securing at-risk facilities in Russia and the former Soviet states through initiatives like Cooperative Threat Reduction, and increasing collaboration between the scientific community and government entities such as the Domestic Nuclear Detection Office to bring cutting edge research and technology to the detection of radioactive materials crossing our borders. 

 

In the cyber-security realm, bolstering public-private partnerships between government entities such as the military and intelligence community, and corporations, financial institutions, and public utilities, often the targets of cyber-attacks, will be important in developing detection and response capabilities and formulating comprehensive rules of engagement. In addition to the military component of COIN operations, civilian teams specializing in security-sector reform, judicial and political affairs, economic development, and infrastructure, will be operating in the battlespace to bolster host government legitimacy, the center of gravity in the campaign. Given the shared responsibilities in the civil-military field on these issues, fostering knowledge integration and cooperation between the various branches of government, military, and civilian stakeholders is of paramount importance to ensuring unity of effort. 

 

The Inheritance is a researched-backed analysis of the challenges we currently face, a legacy of the opportunities missed after 9/11.  While I may be biased because of my involvement with the book, I strongly recommend it to anyone interested in understanding the challenges confronting Obama and the complexities of the geopolitical environment. 

Milcord extends Political Instability Task Force model to insurgency forecasting

Using the COIN and Stability Operations Field Manuals as a process model, Milcord's [[Predictive Societal Indicators of Radicalism]] (PSIR) analytical model predicts future radicalization based on current and historical societal indicators by finding the causal relationships between governance, economic, grievance, essential service indicators, and radicalization metrics. Find out more about our [[PSIR]] project.

Forecasting Traffic

Milcord extends its [[risk-based route planning]] solution to handle forecasted traffic patterns, social and cultural events. The model forecasts future environment from current conditions and historical data and optimizes the mission utility based on the forecasted conditions, thus enabling an agile capability for [[Sense and Respond Logistics]].

Personalized Web 2.0 Service for Authoritative Content

How can you separate authoritative content from the rest on the Internet? With support from the Department of Energy (DoE), Milcord announced a Web 2.0 Service that will accelerate discovery and collaboration in the R&D community by making it easier for scientists and researchers to collaborate and find authoritative and trusted sources of scientific blogs, podcasts, videos, and documents, and by making it easier for scientific publishers to syndicate their content. Find out more about our [[Personalized Web 2.0 Service for Authoritative Content]] project.

Milcord in C4ISR - The Journal of Network-Centric Warfare

Ross Stapleton-Gray discusses "How to Reclaim Computer Networks from Botnets" with particular insight into the cyber attack on Estonia.

Alper Caglayan is the principal investigator at Milcord LLC of Waltham, Mass., which, with the University of Wisconsin and its Wisconsin Advanced Internet Laboratory, was one of the HS-ARPA’s STTR awardees. Milcord’s approach is aimed at reducing the overall bot “ecosystem,” which would reduce their availability for use in attacks such as that conducted against the Estonian Internet sites.

“Our product probably would help Estonia indirectly,” Caglayan said. “If ISPs and corporate networks were using our product to detect and mitigate infected computers, the attacks on Estonia’s government resources would be much less effective. Our goal in this project is not so much to stop systems from being infected, but to detect the infection as soon as possible, then to mitigate the infection.”

Read more about our [[Botnet Defense]] project.

Milcord at MobiSensors'07

Milcord presented a position paper titled "A Commercial Perspective: Collaborating on Application Prototypes as anInfrastructure Provider"at the NSF Workshop on Data Management for Mobile Sensor Networks (MobiSensors).

Sensor data management and fusion is a technical component in a number of our projects across a range of applications and technologies, including: · Monitoring [[SPE|Earth Science]] Data – NASA · [[GEMI|Intelligent Video Surveillance]] – Army · Enemy [[Course of Action Forecasting|Course of Action]] Analysis – Army · Quality of Service in Tactical Networks – Air Force · [[Botnet Defense|Botnet Detection]] and Mitigation – DHS