United Nations General Assembly Voting graph analysis

In this post, I describe approaches to show a similarity between objects using graph analysis and similarity score on the example of the United Nations General Assembly Voting in 1992-2018 years.

Roman Kyrychenko https://www.linkedin.com/in/kirichenko17roman/
11-10-2019

We love networks and graphs, but often we don’t know how to make functional graph analysis using data that we like. It’s straightforward, but we need to develop a rule to classify the relationship between each object in our dataset as “has a connection” or “has no connection”.

I show this on the example of the United Nations General Assembly Voting data.

For graph analysis and visualization, I use tidygraph and ggraph libraries. They provide flexible and tidyverse graph processing.

I found United Nations GA Voting data here.

Also, I wanted to show gross domestic product per capita by each UN country. To do this, I found a dataset with World Bank GDP data here


suppressPackageStartupMessages({
  require(dplyr)
  require(ggplot2)
  require(tidygraph)
  require(ggraph)
})

load("~/UN-73new.RData") #load data from https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/12379

gdp <- readxl::read_excel("~/gdp.xls") %>% #gross domestic product per capita data from https://data.worldbank.org/indicator/ny.gdp.pcap.cd
  select(`Country Name`, `Country Code`, `1960`:`2018`) %>% 
  tidyr::gather("year", "gdp", -`Country Name`, -`Country Code`) %>% 
  filter(!is.na(gdp)) %>% 
  mutate(year = as.numeric(year)) %>% 
  group_by(`Country Name`) %>% 
  slice(n()) %>% 
  mutate(
    Region = countrycode::countrycode(`Country Code`, origin = "iso3c", "continent")
  ) %>% 
  filter(!is.na(Region))

Also, we need to do a little data cleaning. I filtered data from non-UN members in the 1992-2018 period. I recoded all votes that were not “YES” to zero for simplicity.

A similarity score means only a percentage of equal votes between each country. So score = 0 means that nations voted all-time differently. Otherwise, score = 1 means that those nations voted all-time similarly.


un <- completeVotes %>% 
  ungroup() %>% 
  select(Country, date, unres, importantvote, vote) %>% 
  filter(
    vote != 9,
    vote != 8,
    date >= "1992-01-01"
  ) %>% 
  mutate(
    vote = ifelse(vote == 1, 1, 0)
  ) 

To find countries that vote alike, we need to find a similarity score between their voting at the General Assembly:


cor_mat <- un %>% 
  select(unres, Country, vote) %>% 
  distinct(Country, unres, .keep_all = T) %>% 
  widyr::pairwise_similarity(Country, unres, vote)

Now we have a challenge to define connections between countries. I offer two approaches:

  1. We establish a threshold above what we consider countries as friends.
  2. We find for each country N countries with the biggest similarity score and consider these countries as friends.

Let’s try the first approach:


gr <- igraph::graph.data.frame(
  cor_mat %>% 
    group_by(item1) %>% 
    top_n(3, similarity) #top 3 contries
)

graph <- as_tbl_graph(gr) %>% 
  left_join(
    gdp %>% rename(country = `Country Name`), by = c("name" = "Country Code")
  ) %>% 
  mutate(
    country = ifelse(is.na(country), countrycode::countrycode(name, origin = "iso3c", "country.name"), country)
  ) %>% 
  filter(!is.na(country)) %>% 
  mutate(
    Region = ifelse(is.na(Region), countrycode::countrycode(name, origin = "iso3c", "continent"), Region)
  ) 

Let’s visualize this graph:


ggraph(graph, layout = 'kk', maxiter = 10000) + 
  geom_node_point(aes(size = gdp, color = Region), alpha = 0.5) + 
  geom_edge_fan(alpha = 0.5, show.legend = FALSE, width = 0.05) + 
  geom_node_text(aes(label = country), repel = T, size = 1.5, show.legend = FALSE) +
  scale_size(range = c(0.01, 10), name = "Gross Domestic Product per Capita", guide = guide_legend(
    title.position = "top",
    label.position = "bottom")) +
  scale_color_manual(values = c(
    "#e41a1c",
    "#377eb8",
    "#4daf4a",
    "#984ea3",
    "#ff7f00"
  ), name = "Continent", guide = guide_legend(
    title.position = "top",
    label.position = "bottom")) +
  labs(
    title = "United Nations General Assembly Votes 1992-2018", 
    subtitle = "Each country connected with 3 most similar countries by voting", 
    caption = "Data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/12379"
  ) +
  hrbrthemes::theme_ipsum(base_family = "Lato") +
  theme(
    panel.grid = element_blank(), 
    legend.position = "bottom", 
    axis.title = element_blank(), 
    axis.text = element_blank()
  ) 
United Nations General Assembly Votes 1992-2018 (3 friends)

Figure 1: United Nations General Assembly Votes 1992-2018 (3 friends)

You can see that there are two clusters of nations:

Will the results we get with the second approach be different? Let’s see:


gr <- igraph::graph.data.frame(
  cor_mat %>% 
    group_by(item1) %>% 
    filter(similarity > 0.8) #we change only this part
)

graph <- as_tbl_graph(gr) %>% 
  left_join(
    gdp %>% rename(country = `Country Name`), by = c("name" = "Country Code")
  ) %>% 
  mutate(
    country = ifelse(is.na(country), countrycode::countrycode(name, origin = "iso3c", "country.name"), country)
  ) %>% 
  filter(!is.na(country)) %>% 
  mutate(
    Region = ifelse(is.na(Region), countrycode::countrycode(name, origin = "iso3c", "continent"), Region)
  ) 

Let’s visualize this graph:


ggraph(graph, layout = 'kk', maxiter = 10000) + 
  geom_node_point(aes(size = gdp, color = Region), alpha = 0.5, ) + 
  geom_edge_fan(alpha = 0.5, show.legend = FALSE, width = 0.01) + 
  geom_node_text(aes(label = country), repel = T, size = 1.5, show.legend = FALSE) +
  scale_size(range = c(0.01, 10), name = "Gross Domestic Product per Capita", guide = guide_legend(
    title.position = "top",
    label.position = "bottom")) +
  scale_color_manual(values = c(
    "#e41a1c",
    "#377eb8",
    "#4daf4a",
    "#984ea3",
    "#ff7f00"
  ), name = "Continent", guide = guide_legend(
    title.position = "top",
    label.position = "bottom")) +
  labs(
    title = "United Nations General Assembly Votes 1992-2018", 
    subtitle = "Similarity above 80%", 
    caption = "Data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/12379"
  ) +
  hrbrthemes::theme_ipsum(base_family = "Lato") +
  theme(
    panel.grid = element_blank(), 
    legend.position = "bottom", 
    axis.title = element_blank(), 
    axis.text = element_blank()
  )
United Nations General Assembly Votes 1992-2018 (with threshold)

Figure 2: United Nations General Assembly Votes 1992-2018 (with threshold)

The result looks different, but still, there are two same clusters as in the first graph. Also, some countries are not included in this network, because they don’t have a similarity above 0.8 with any country.

Thus, both approaches are useful, but the first graph looks better and represent all countries.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kyrychenko (2019, Nov. 10). Random Forest: United Nations General Assembly Voting graph analysis. Retrieved from http://randomforest.run/posts/united-nations-general-assembly-voting-graph-analysis/

BibTeX citation

@misc{kyrychenko2019united,
  author = {Kyrychenko, Roman},
  title = {Random Forest: United Nations General Assembly Voting graph analysis},
  url = {http://randomforest.run/posts/united-nations-general-assembly-voting-graph-analysis/},
  year = {2019}
}