Twitter-emojis analysis. Australian bushfires case

In this post, I describe my approach to Twitter-data collection and analyzing that I used to make emoji-map.

Author
Published

January 15, 2020

Task definition

If you want to create a map of most used emojis in tweets about Australia bushfires, you should go with the following tasks:

  • Data collection from Twitter;
  • Defining of location for each user (we can use check-ins and location field from user-profile);
  • Emoji extraction (we are interested only in emojis those users used in tweets, only these will be counted);
  • Visualization (we want to allocate emojis on the map by each country). So, let’s start.

Twitter application

Twitter is very friendly to data collection. You can use its API. To do this, you need to make an application here. Also, you need to have a Twitter account.

If you have Twitter-app, open Details, tab “Keys and tokens,” where you can find needed credentials:

I used the rtweet package to search posts about Australia bushfires. R has many packages that provide a connection to Twitter API, in particular, I use early twitteR package, but rtweet from ropenscience society more user-friendly and powerful. The main strongness of the rtweet package is opportunities to handle with rate limits.

twitter_credentials.R - file contains CUSTOMER_KEY, CUSTOMER_SECRET, ACCESS_TOKEN, and ACCESS_secret those you can receive from twitter application.

rtweet package provides functions that are easy to use and powerful for data extraction from Twitter. You need only to create a token for connection to Twitter API and use search_tweets function to search tweets for a defined period (Twitter API has a limitation on the last 6-9 days).

I created a vector with hashtags about Australian bushfires called terms and searched all tweets with them. I used the code below:

Code
suppressPackageStartupMessages({
  require(rtweet)
  require(ore)
  require(dplyr)
  require(ggplot2)
  require(ggtext)
  require(rvest)
})

source("scripts/twitter_credentials.R")

token <- create_token(
  app = "twittScrap",
  consumer_key = CUSTOMER_KEY,
  consumer_secret = CUSTOMER_SECRET,
  access_token = ACCESS_TOKEN,
  access_secret = ACCESS_secret)

terms <- c(
  "prayforaustralia", "australiaonfire", "australiafires", "australia", 
  "australianbushfire", "australianfires", "australiaburning", 
  "australiaburns", "pray4australia", "australiabushfires", "prayforrain"
  )

aus <- search_tweets(q = paste(terms, collapse = " OR "), n = 10^10, 
                     include_rts = FALSE, retryonratelimit = TRUE, 
                     since = "2020-01-01", until = "2020-01-06")

readr::write_rds(aus, "data/aus_bushfires.rds")

Let’s look at the collected data:

Code
aus <- readr::read_rds("australia_new2.rds") %>%
  select(user_id, status_id, location, created_at, coords_coords, text)

skimr::skim(aus)
Data summary
Name aus
Number of rows 143998
Number of columns 6
_______________________
Column type frequency:
character 4
list 1
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
user_id 0 1 2 19 0 113529 0
status_id 0 1 19 19 0 143989 0
location 0 1 0 148 39654 36352 43
text 0 1 9 978 0 143316 0

Variable type: list

skim_variable n_missing complete_rate n_unique min_length max_length
coords_coords 0 1 275 2 2

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
created_at 0 1 2020-01-07 03:12:34 2020-01-07 23:59:59 2020-01-07 14:41:49 63090

I have 1 293 284 tweets, but not all are interesting to us. Firstly, I only need tweets with known location (coordinates, city, country). Secondly, I only need tweets with emojis.

At first, I tackle the location problem.

Locations

I use OpenStreetMap API to convert location names to geographical coordinates. To do it, I run geocode_OSM from the tmaptools library. Note: OSM API has limit one request per second.

Code
locs <- aus %>%
  group_by(location = stringr::str_to_lower(location)) %>%
  count() %>%
  arrange(desc(n))

top_locs <- tmaptools::geocode_OSM(locs$location, as.data.frame = T)

readr::write_rds(top_locs, "data/top_locs.rds")

So, when you have longitudes and latitudes for most tweets, you need to convert these coordinates to country names.

The functions below provide conversion coordinates to the country name in which those coordinates appear.

Code
coords2country <- function(points) {
  countriesSP <- rworldmap::getMap(resolution = "low")
  ina <- is.na(points[[1]])
  pointsSP <- sp::SpatialPoints(points[!ina, ], proj4string = sp::CRS(sp::proj4string(countriesSP)))
  res <- rep(NA, nrow(points))
  res[!ina] <- as.character(sp::over(pointsSP, countriesSP)$ADMIN)
  res
}

So I can apply these functions and get a data frame with coordinates for each tweet as a result. I will draw your attention to the fact that some tweets already have coordinates in coords_coords variable. For these tweets, I extract these values.

Code
top_locs <- readr::read_rds("top_locs.rds")

aus_det <- aus %>%
  mutate(
    location = stringr::str_remove_all(stringr::str_to_lower(location), "#"),
    country = coords2country(
      tibble(
        lon = purrr::map_dbl(coords_coords, ~ .[1]),
        lat = purrr::map_dbl(coords_coords, ~ .[2])
      )
    )
  ) %>%
  left_join(top_locs, by = c("location" = "query")) %>%
  mutate(
    country = if_else(is.na(country), coords2country(select(., lon, lat)), country)
  ) %>%
  filter(
    !is.na(country) & 
      between(
        created_at, 
        lubridate::ymd_hms("2019-12-31 00:00:00"), 
        lubridate::ymd_hms("2020-01-07 23:59:59")
      )
    )

skimr::skim(aus_det)
Data summary
Name aus_det
Number of rows 63562
Number of columns 13
_______________________
Column type frequency:
character 5
list 1
numeric 6
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
user_id 0 1 2 19 0 47273 0
status_id 0 1 19 19 0 63042 0
location 0 1 0 68 37 5370 0
text 0 1 10 958 0 62887 0
country 0 1 4 32 0 167 0

Variable type: list

skim_variable n_missing complete_rate n_unique min_length max_length
coords_coords 0 1 262 2 2

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
lat 174 1 15.51 33.28 -72.84 -24.78 30.72 41.89 77.62 ▁▆▂▇▃
lon 174 1 1.44 95.45 -158.08 -81.46 -3.28 102.27 176.36 ▅▇▆▂▇
lat_min 174 1 9.57 36.15 -85.05 -30.58 23.54 41.00 70.66 ▂▆▃▇▇
lat_max 174 1 19.16 33.90 -60.00 -9.09 33.06 44.88 83.88 ▂▃▂▇▂
lon_min 174 1 -9.09 96.78 -180.00 -84.64 -14.02 72.25 176.07 ▃▇▅▃▅
lon_max 174 1 15.99 101.27 -157.92 -76.37 0.03 138.76 180.00 ▃▇▆▂▇

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
created_at 0 1 2020-01-07 03:12:34 2020-01-07 23:59:59 2020-01-07 14:34:29 41894

You can also map these tweets on the map as follows:

Code
world <- map_data("world")

ggplot() +
  geom_polygon(data = world, aes(long, lat, group = group), 
               color = "black", fill = "lightgray", linewidth = 0.1) +
  geom_point(data = aus_det, aes(lon, lat), size = 0.1) + 
  coord_map(projection = "gilbert", ylim = c(85, -50), xlim = c(180, -180)) +
  xlab("") +
  ylab("") +
  hrbrthemes::theme_ipsum() +
  hrbrthemes::theme_ipsum(base_family = "Lato") +
  theme(
    panel.grid = element_blank(),
    axis.text = element_blank()
  )

So, yet you should extract emojis for these tweets.

Emojis extraction and analysis

To detect emojis in text, you need a dataset with emojis that contains Unicodes for each emoji.

Using this dataset, I created a regular expression to extract emojis from tweets.

The function extract_emojis returns a list with emojis used in each tweet.

Code
emoji <- readr::read_csv(
  "https://raw.githubusercontent.com/laurenancona/twimoji/gh-pages/twitterEmojiProject/emoticon_conversion_noGraphic.csv",
  col_names = F
) %>% slice(-1)

emoji_regex <- sprintf("(%s)", paste0(emoji$X2, collapse = "|"))
compiled <- ore(emoji_regex)

extract_emojis <- function(text_vector) {
  res <- vector(mode = "list", length = length(text_vector))

  where <- which(grepl(emoji_regex, text_vector, useBytes = TRUE))
  cat("detected items with emojis\n")
  chat_emoji_lines <- text_vector[where]

  found_emoji <- ore.search(compiled, chat_emoji_lines, all = TRUE)
  res[where] <- ore::matches(found_emoji)
  cat("created list with emojis\n")
  res
}

Let’s apply this function to your tweets:

Code
aus_emo <- aus_det %>%
  mutate(
    emoji = extract_emojis(text)
  ) %>%
  filter(!sapply(emoji, is.null)) %>%
  tidyr::unnest(emoji)

We detect 181624 emojis!

Code
skimr::skim(aus_emo)
Data summary
Name aus_emo
Number of rows 181624
Number of columns 15
_______________________
Column type frequency:
character 6
list 1
numeric 7
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
user_id 0 1 3 19 0 61944 0
status_id 0 1 19 19 0 83766 0
location 0 1 0 52 200 5791 0
text 0 1 1 963 0 82929 0
country 0 1 4 28 0 170 0
emoji 0 1 1 2 0 701 0

Variable type: list

skim_variable n_missing complete_rate n_unique min_length max_length
coords_coords 0 1 517 2 2

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
retweet_count 0 1.00 9.33 331.02 0.00 0.00 0.00 1.00 42373.00 ▇▁▁▁▁
lat 1022 0.99 14.00 32.48 -72.84 -24.78 22.35 40.97 77.62 ▁▆▃▇▃
lon 1022 0.99 17.37 92.01 -169.86 -73.78 1.89 112.69 177.33 ▃▇▇▃▇
lat_min 1022 0.99 8.28 35.70 -85.05 -28.26 18.44 40.31 68.55 ▂▆▃▇▇
lat_max 1022 0.99 17.40 32.59 -60.00 -9.09 25.77 43.48 83.88 ▂▃▃▇▂
lon_min 1022 0.99 6.80 93.20 -180.00 -75.56 -0.24 77.05 177.33 ▂▇▆▆▆
lon_max 1022 0.99 28.53 96.17 -169.56 -66.85 6.41 124.05 180.00 ▂▇▇▃▇

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
created_at 0 1 2019-12-31 00:00:08 2020-01-07 23:59:45 2020-01-05 05:47:06 77047

I want to map emojis in the center of each country. To do it, I need the coordinates of those. We can do it in the following way:

Code
centroids_df <- rworldmap::getMap(resolution = "high") %>%
  sf::st_as_sf() %>% 
  #rgeos::gCentroid(byid = TRUE) %>%
  sf::st_centroid() %>% 
  #as.data.frame(row.names = F) %>%
  as_tibble(rownames = "country")

skimr::skim(centroids_df)
Data summary
Name centroids_df
Number of rows 253
Number of columns 53
_______________________
Column type frequency:
character 2
factor 36
numeric 15
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country 0 1 4 40 0 253 0
geometry 0 1 35 39 0 253 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
ne_10m_adm 0 1.00 FALSE 253 ABW: 1, AFG: 1, AGO: 1, AIA: 1
FeatureCla 0 1.00 FALSE 1 Adm: 253
SOVEREIGNT 0 1.00 FALSE 204 Uni: 18, Fra: 9, Uni: 7, Aus: 6
SOV_A3 0 1.00 FALSE 205 GB1: 18, FR1: 9, US1: 7, AU1: 6
TYPE 0 1.00 FALSE 7 Sov: 188, Dep: 36, Cou: 20, Ind: 4
ADMIN 0 1.00 FALSE 253 Afg: 1, Akr: 1, Ala: 1, Alb: 1
ADM0_A3 0 1.00 FALSE 253 ABW: 1, AFG: 1, AGO: 1, AIA: 1
GEOUNIT 0 1.00 FALSE 253 Afg: 1, Akr: 1, Ala: 1, Alb: 1
GU_A3 0 1.00 FALSE 253 ABW: 1, AFG: 1, AGO: 1, AIA: 1
SUBUNIT 0 1.00 FALSE 253 Afg: 1, Akr: 1, Ala: 1, Alb: 1
SU_A3 0 1.00 FALSE 253 ABW: 1, AFG: 1, AGO: 1, AIA: 1
NAME 3 0.99 FALSE 250 Afg: 1, Akr: 1, Ala: 1, Alb: 1
ABBREV 3 0.99 FALSE 247 Ang: 2, S.L: 2, St.: 2, A.C: 1
POSTAL 3 0.99 FALSE 240 J: 3, AI: 2, AU: 2, CI: 2
NAME_FORMA 57 0.77 FALSE 196 Ara: 1, Arg: 1, Bai: 1, Bai: 1
TERR_ 206 0.19 FALSE 15 U.K: 14, Fr.: 7, U.S: 4, Auz: 3
NAME_SORT 0 1.00 FALSE 253 Afg: 1, Akr: 1, Ala: 1, Alb: 1
ISO_A2 0 1.00 FALSE 237 -99: 15, AU: 2, PS: 2, AD: 1
ISO_A3 0 1.00 FALSE 253 ABW: 1, AFG: 1, AGO: 1, AIA: 1
ISO3 0 1.00 FALSE 253 ABW: 1, AFG: 1, AGO: 1, AIA: 1
ISO3.1 0 1.00 FALSE 253 ABW: 1, AFG: 1, AGO: 1, AIA: 1
ADMIN.1 0 1.00 FALSE 253 Afg: 1, Akr: 1, Ala: 1, Alb: 1
REGION 3 0.99 FALSE 7 Eur: 70, Afr: 57, Asi: 46, Sou: 44
continent 3 0.99 FALSE 6 Eur: 116, Afr: 57, Sou: 44, Aus: 27
GEO3major 3 0.99 FALSE 7 Eur: 70, Asi: 62, Afr: 57, Lat: 44
GEO3 3 0.99 FALSE 24 Wes: 40, Car: 23, Sou: 22, Cen: 21
IMAGE24 3 0.99 FALSE 26 Wes: 36, Res: 30, Oce: 27, Wes: 24
GLOCAF 4 0.98 FALSE 19 Eur: 61, Sub: 49, Res: 30, Oce: 27
Stern 4 0.98 FALSE 13 Eur: 70, Aus: 27, Sou: 25, Wes: 24
SRESmajor 4 0.98 FALSE 4 ALM: 114, OEC: 55, ASI: 50, REF: 30
SRES 4 0.98 FALSE 11 Sub: 49, Lat: 42, Wes: 42, Oth: 33
GBD 4 0.98 FALSE 21 Eur: 44, Car: 26, Oce: 22, Nor: 19
AVOIDname 4 0.98 FALSE 30 Eur: 51, Sou: 24, Wes: 24, Car: 23
LDC 3 0.99 FALSE 2 oth: 201, LDC: 49
SID 3 0.99 FALSE 2 oth: 200, SID: 50
LLDC 3 0.99 FALSE 2 oth: 219, LLD: 31

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ScaleRank 0 1.00 1.49 1.07 0.00 1.00 1.00 1.00 5.000000e+00 ▇▁▂▁▁
LabelRank 0 1.00 3.46 2.03 2.00 2.00 2.00 5.00 1.600000e+01 ▇▅▁▁▁
OID_ 0 1.00 138.25 74.16 10.00 76.00 139.00 202.00 2.650000e+02 ▇▇▇▇▇
ADM0_DIF 0 1.00 0.20 0.40 0.00 0.00 0.00 0.00 1.000000e+00 ▇▁▁▁▂
LEVEL 0 1.00 2.00 0.00 2.00 2.00 2.00 2.00 2.000000e+00 ▁▁▇▁▁
GEOU_DIF 0 1.00 0.00 0.06 0.00 0.00 0.00 0.00 1.000000e+00 ▇▁▁▁▁
SU_DIF 0 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000000e+00 ▁▁▇▁▁
MAP_COLOR 0 1.00 6.14 3.61 0.00 3.00 6.00 9.00 1.300000e+01 ▅▇▅▅▅
POP_EST 2 0.99 27034804.24 116572130.87 0.00 151016.50 4203200.00 14939676.50 1.338613e+09 ▇▁▁▁▁
GDP_MD_EST 0 1.00 275523.75 1134318.92 -99.00 1577.00 17820.00 107700.00 1.426000e+07 ▇▁▁▁▁
FIPS_10_ 0 1.00 -5.48 22.68 -99.00 0.00 0.00 0.00 0.000000e+00 ▁▁▁▁▇
ISO_N3 0 1.00 402.81 274.90 -99.00 184.00 410.00 634.00 8.940000e+02 ▆▇▇▇▆
LON 0 1.00 14.43 74.32 -176.16 -36.68 19.39 50.54 1.792100e+02 ▁▃▇▃▂
LAT 0 1.00 17.37 26.24 -80.56 1.85 17.42 38.99 7.476000e+01 ▁▂▆▇▃
AVOIDnumeric 4 0.98 22.61 6.50 1.00 21.00 25.00 27.00 3.000000e+01 ▁▁▁▅▇

So let’s calculate top emoji by each country:

Code
top_emo <- aus_emo %>%
  group_by(country, emoji) %>%
  dplyr::count() %>%
  group_by(country) %>%
  top_n(1, wt = n) %>%
  dplyr::arrange(desc(n)) %>%
  left_join(centroids_df, by = "country") %>%
  group_by(country) %>%
  slice(1) %>%
  ungroup()

skimr::skim(top_emo)
Data summary
Name top_emo
Number of rows 170
Number of columns 55
_______________________
Column type frequency:
character 3
factor 36
numeric 16
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country 0 1 4 28 0 170 0
emoji 0 1 1 1 0 24 0
geometry 0 1 35 39 0 170 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
ne_10m_adm 0 1.00 FALSE 170 ABW: 1, AFG: 1, AGO: 1, ALB: 1
FeatureCla 0 1.00 FALSE 1 Adm: 170
SOVEREIGNT 0 1.00 FALSE 156 Uni: 6, Uni: 4, Chi: 2, Den: 2
SOV_A3 0 1.00 FALSE 156 GB1: 6, US1: 4, CH1: 2, DN1: 2
TYPE 0 1.00 FALSE 5 Sov: 145, Cou: 16, Dep: 7, Cou: 1
ADMIN 0 1.00 FALSE 170 Afg: 1, Alb: 1, Alg: 1, And: 1
ADM0_A3 0 1.00 FALSE 170 ABW: 1, AFG: 1, AGO: 1, ALB: 1
GEOUNIT 0 1.00 FALSE 170 Afg: 1, Alb: 1, Alg: 1, And: 1
GU_A3 0 1.00 FALSE 170 ABW: 1, AFG: 1, AGO: 1, ALB: 1
SUBUNIT 0 1.00 FALSE 170 Afg: 1, Alb: 1, Alg: 1, And: 1
SU_A3 0 1.00 FALSE 170 ABW: 1, AFG: 1, AGO: 1, ALB: 1
NAME 0 1.00 FALSE 170 Afg: 1, Alb: 1, Alg: 1, And: 1
ABBREV 0 1.00 FALSE 170 Afg: 1, Alb: 1, Alg: 1, And: 1
POSTAL 0 1.00 FALSE 166 J: 3, CN: 2, IS: 2, A: 1
NAME_FORMA 27 0.84 FALSE 143 Ara: 1, Arg: 1, Bai: 1, Bai: 1
TERR_ 156 0.08 FALSE 10 Cro: 3, U.K: 2, U.S: 2, Ass: 1
NAME_SORT 0 1.00 FALSE 170 Afg: 1, Alb: 1, Alg: 1, And: 1
ISO_A2 0 1.00 FALSE 168 -99: 2, PS: 2, AD: 1, AE: 1
ISO_A3 0 1.00 FALSE 170 ABW: 1, AFG: 1, AGO: 1, ALB: 1
ISO3 0 1.00 FALSE 170 ABW: 1, AFG: 1, AGO: 1, ALB: 1
ISO3.1 0 1.00 FALSE 170 ABW: 1, AFG: 1, AGO: 1, ALB: 1
ADMIN.1 0 1.00 FALSE 170 Afg: 1, Alb: 1, Alg: 1, And: 1
REGION 0 1.00 FALSE 7 Eur: 54, Asi: 39, Afr: 36, Sou: 32
continent 0 1.00 FALSE 6 Eur: 93, Afr: 36, Sou: 32, Aus: 6
GEO3major 0 1.00 FALSE 7 Eur: 54, Afr: 36, Asi: 34, Lat: 32
GEO3 0 1.00 FALSE 24 Wes: 27, Cen: 19, Sou: 13, Car: 11
IMAGE24 0 1.00 FALSE 26 Wes: 23, Cen: 19, Res: 18, Mid: 15
GLOCAF 1 0.99 FALSE 19 Eur: 46, Sub: 30, Res: 18, Mid: 12
Stern 1 0.99 FALSE 13 Eur: 54, Sou: 17, Eas: 16, Sou: 13
SRESmajor 1 0.99 FALSE 4 ALM: 81, OEC: 35, ASI: 27, REF: 26
SRES 1 0.99 FALSE 11 Lat: 30, Sub: 30, Wes: 28, Mid: 21
GBD 1 0.99 FALSE 21 Eur: 30, Nor: 18, Car: 14, Eur: 13
AVOIDname 1 0.99 FALSE 30 Eur: 44, Sou: 16, Wes: 13, Sou: 12
LDC 0 1.00 FALSE 2 oth: 141, LDC: 29
SID 0 1.00 FALSE 2 oth: 149, SID: 21
LLDC 0 1.00 FALSE 2 oth: 145, LLD: 25

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
n 0 1.00 182.07 618.11 1.00 6.25 24.50 81.00 5.524000e+03 ▇▁▁▁▁
ScaleRank 0 1.00 1.16 0.58 1.00 1.00 1.00 1.00 4.000000e+00 ▇▁▁▁▁
LabelRank 0 1.00 2.82 1.51 2.00 2.00 2.00 2.75 8.000000e+00 ▇▁▂▁▁
OID_ 0 1.00 149.66 70.81 10.00 97.25 153.50 205.75 2.650000e+02 ▃▆▇▇▇
ADM0_DIF 0 1.00 0.09 0.28 0.00 0.00 0.00 0.00 1.000000e+00 ▇▁▁▁▁
LEVEL 0 1.00 2.00 0.00 2.00 2.00 2.00 2.00 2.000000e+00 ▁▁▇▁▁
GEOU_DIF 0 1.00 0.01 0.08 0.00 0.00 0.00 0.00 1.000000e+00 ▇▁▁▁▁
SU_DIF 0 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000000e+00 ▁▁▇▁▁
MAP_COLOR 0 1.00 6.08 3.59 0.00 3.00 6.00 9.00 1.300000e+01 ▅▇▅▆▅
POP_EST 0 1.00 38956131.21 140103771.04 1398.00 2724850.50 9047593.50 27408216.00 1.338613e+09 ▇▁▁▁▁
GDP_MD_EST 0 1.00 408635.88 1365376.27 0.00 13585.00 55205.00 246600.00 1.426000e+07 ▇▁▁▁▁
FIPS_10_ 0 1.00 -0.58 7.59 -99.00 0.00 0.00 0.00 0.000000e+00 ▁▁▁▁▇
ISO_N3 0 1.00 418.33 263.84 -99.00 197.75 409.00 640.00 8.940000e+02 ▅▆▇▆▇
LON 0 1.00 16.95 62.31 -169.87 -8.10 20.63 46.45 1.779800e+02 ▁▃▇▃▁
LAT 0 1.00 21.44 26.33 -80.56 6.70 23.89 41.72 7.476000e+01 ▁▂▃▇▃
AVOIDnumeric 1 0.99 22.80 5.92 1.00 21.00 25.00 26.00 3.000000e+01 ▁▁▁▅▇

You can download images with these emojis from emojipedia (we need images to visualize them on the map). I did it using functions from the rvest package. The function emoji_to_link gets a URL for each emoji that you can use to download and visualize images of emojis. The function link_to_img provides a conversion path of each downloaded image to markdown format that I use in the plot.

Code
top_emo$x <- (top_emo$geometry %>% sf::st_coordinates())[,1]
top_emo$y <- (top_emo$geometry %>% sf::st_coordinates())[,2]
#emoji_to_link <- function(x) {
#  paste0("https://emojipedia.org/emoji/", x) %>%
#    read_html() %>%
#    html_nodes("tr td a") %>%
#    .[1] %>%
#    html_attr("href") %>%
#    paste0("https://emojipedia.org/", .) %>%
#    read_html() %>%
#    html_node('div[class="vendor-image"] img') %>%
#    html_attr("src")
#}
#
#link_to_img <- function(x, size = 25) {
#  paste0("<img src='", x, "' width='", size, "'/>")
#}

#emo <- top_emo %>%
#  distinct(emoji) %>%
#  mutate(
#    url = purrr::map_chr(emoji, purrr::slowly(~ emoji_to_link(.x), #purrr::rate_delay(1))),
#    label = link_to_img(paste0("emoji/", basename(unique(url))))
#  )
#
#top_emo <- top_emo %>% left_join(emo, by = "emoji")
#
#if (!dir.exists("emoji")) dir.create("emoji")
#
#p <- purrr::map2(emo$url, paste0("emoji/", basename(emo$url)), #download.file)

#skimr::skim(top_emo)

Thus we have all we needed for visualization.

Visualization

Let’s create the map using functions from ggplot2 package:

Code
ggplot() +
  geom_polygon(data = world, aes(long, lat, group = group), color = "black", fill = "lightgray", linewidth = 0.1) +
  geom_richtext(
    data = top_emo %>% ungroup(),# %>%
      #mutate(label = stringr::str_replace_all(label, "'25'", paste0("'", round(log1p(top_emo$n) * 3), "'"))),
    aes(x, y, label = emoji), fill = NA, label.color = NA, label.padding = grid::unit(rep(0, 4), "pt"), family="EmojiOne"
  ) +
  coord_map(projection = "gilbert", ylim = c(85, -50), xlim = c(180, -180)) +
  xlab("") +
  ylab("") +
  labs(
    title = "<img src='https://em-content.zobj.net/thumbs/320/twitter/348/flag-australia_1f1e6-1f1fa.png' width='35'/> Australia bushfires in emojis",
    subtitle = "<br/>Emoji is basically like another language: it has its own rules, 
    it can cover anything that comes to one's mind, <br/>
you can build whole sentences using only those tiny faces and other symbols. While people were praying <br/>
for Australia on Twitter they used plenty of emojis as well, but only few of them were the most common ones.<br/>

The map below shows the emojis used most frequently by country in tweets with hashtags 
<span style='color:blue'>#prayforaustralia</span>, <br/>
<span style='color:blue'>#australiaonfire</span>, <span style='color:blue'>#australiafires</span>, 
<span style='color:blue'>#australia</span>, <span style='color:blue'>#australianbushfire</span>, 
<span style='color:blue'>#australianfires</span>, <span style='color:blue'>#australiaburning</span>, 
<span style='color:blue'>#australiaburns</span>, <br/> <span style='color:blue'>#pray4australia</span>, 
<span style='color:blue'>#australiabushfires</span>, <span style='color:blue'>#prayforrain</span><br>",
    caption = glue::glue("Data: twitter.com, {format(n_distinct(aus_emo, 'status_id'), big.mark = ' ')} tweets with marked location")
  ) +
  hrbrthemes::theme_ipsum(base_family = "Lato") +
  theme(
    panel.grid = element_blank(),
    axis.text = element_blank(),
    plot.title = element_markdown(size = 35, face = "bold", colour = "black", vjust = -1),
    plot.subtitle = element_markdown(size = 18, vjust = -1, lineheight = 1.1)
  )

I also used the ggtext package to make markdown formatting in text labels, ggalt package that provides Winkel tripel map projection and hrbrthemes that provides awesome themes for ggplots.

Conclusion

Hurray, we have the map of top emoji by each country on Twitter!

Here I want to make a summary describing packages that I used to do this analysis.

I used the following libraries:

  • rtweet - for access to Twitter API;
  • ore - to create regular expressions;
  • dplyr - to manipulate data;
  • ggplot2 - to make pretty map;
  • ggtext - to make markdown labels in ggplot2;
  • rvest - for web-scraping;
  • readr - to read data from files;
  • skimr - to make beatiful data-summary;
  • stringr - for text transformations;
  • tmaptools - to search coordinates by location name;
  • rworldmap - to load world map;
  • sp - for maps manipulations;
  • purrr - package provides functional programming in R;
  • lubridate - for date conversions;
  • ggalt - to make Winkel tripel map projection;
  • hrbrthemes - to make pretty ggplot2 themes;
  • glue - for better text formatting.

All calcutaion made in R version 3.5.2.

Citation

BibTeX citation:
@online{kyrychenko2020,
  author = {Kyrychenko, Roman},
  title = {Twitter-Emojis Analysis. {Australian} Bushfires Case},
  date = {2020-01-15},
  url = {https://randomforest.run/posts/twitter-emojis-analysis/twitter-emojis-analysis.html},
  langid = {en}
}
For attribution, please cite this work as:
Kyrychenko, Roman. 2020. “Twitter-Emojis Analysis. Australian Bushfires Case.” January 15, 2020. https://randomforest.run/posts/twitter-emojis-analysis/twitter-emojis-analysis.html.