Twitter-emojis analysis. Australian bushfires case

In this post, I describe my approach to Twitter-data collection and analyzing that I used to make emoji-map.

Author

Published

January 15, 2020

Task definition

If you want to create a map of most used emojis in tweets about Australia bushfires, you should go with the following tasks:

Data collection from Twitter;
Defining of location for each user (we can use check-ins and location field from user-profile);
Emoji extraction (we are interested only in emojis those users used in tweets, only these will be counted);
Visualization (we want to allocate emojis on the map by each country). So, let’s start.

Twitter application

Twitter is very friendly to data collection. You can use its API. To do this, you need to make an application here. Also, you need to have a Twitter account.

If you have Twitter-app, open Details, tab “Keys and tokens,” where you can find needed credentials:

I used the rtweet package to search posts about Australia bushfires. R has many packages that provide a connection to Twitter API, in particular, I use early twitteR package, but rtweet from ropenscience society more user-friendly and powerful. The main strongness of the rtweet package is opportunities to handle with rate limits.

twitter_credentials.R - file contains CUSTOMER_KEY, CUSTOMER_SECRET, ACCESS_TOKEN, and ACCESS_secret those you can receive from twitter application.

rtweet package provides functions that are easy to use and powerful for data extraction from Twitter. You need only to create a token for connection to Twitter API and use search_tweets function to search tweets for a defined period (Twitter API has a limitation on the last 6-9 days).

I created a vector with hashtags about Australian bushfires called terms and searched all tweets with them. I used the code below:

Code

suppressPackageStartupMessages({
  require(rtweet)
  require(ore)
  require(dplyr)
  require(ggplot2)
  require(ggtext)
  require(rvest)
})

source("scripts/twitter_credentials.R")

token <- create_token(
  app = "twittScrap",
  consumer_key = CUSTOMER_KEY,
  consumer_secret = CUSTOMER_SECRET,
  access_token = ACCESS_TOKEN,
  access_secret = ACCESS_secret)

terms <- c(
  "prayforaustralia", "australiaonfire", "australiafires", "australia", 
  "australianbushfire", "australianfires", "australiaburning", 
  "australiaburns", "pray4australia", "australiabushfires", "prayforrain"
  )

aus <- search_tweets(q = paste(terms, collapse = " OR "), n = 10^10, 
                     include_rts = FALSE, retryonratelimit = TRUE, 
                     since = "2020-01-01", until = "2020-01-06")

readr::write_rds(aus, "data/aus_bushfires.rds")

Let’s look at the collected data:

Code

aus <- readr::read_rds("australia_new2.rds") %>%
  select(user_id, status_id, location, created_at, coords_coords, text)

skimr::skim(aus)

Data summary
Name	aus
Number of rows	143998
Number of columns	6
_______________________
Column type frequency:
character	4
list	1
POSIXct	1
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique	whitespace
user_id	1	2	19	0	113529	0
status_id	1	19	19	0	143989	0
location	1	0	148	39654	36352	43
text	1	9	978	0	143316	0

Variable type: list

skim_variable	n_missing	complete_rate	n_unique	min_length	max_length
coords_coords	0	1	275	2	2

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
created_at	0	1	2020-01-07 03:12:34	2020-01-07 23:59:59	2020-01-07 14:41:49	63090

I have 1 293 284 tweets, but not all are interesting to us. Firstly, I only need tweets with known location (coordinates, city, country). Secondly, I only need tweets with emojis.

At first, I tackle the location problem.

Locations

I use OpenStreetMap API to convert location names to geographical coordinates. To do it, I run geocode_OSM from the tmaptools library. Note: OSM API has limit one request per second.

Code

locs <- aus %>%
  group_by(location = stringr::str_to_lower(location)) %>%
  count() %>%
  arrange(desc(n))

top_locs <- tmaptools::geocode_OSM(locs$location, as.data.frame = T)

readr::write_rds(top_locs, "data/top_locs.rds")

So, when you have longitudes and latitudes for most tweets, you need to convert these coordinates to country names.

The functions below provide conversion coordinates to the country name in which those coordinates appear.

Code

coords2country <- function(points) {
  countriesSP <- rworldmap::getMap(resolution = "low")
  ina <- is.na(points[[1]])
  pointsSP <- sp::SpatialPoints(points[!ina, ], proj4string = sp::CRS(sp::proj4string(countriesSP)))
  res <- rep(NA, nrow(points))
  res[!ina] <- as.character(sp::over(pointsSP, countriesSP)$ADMIN)
  res
}

So I can apply these functions and get a data frame with coordinates for each tweet as a result. I will draw your attention to the fact that some tweets already have coordinates in coords_coords variable. For these tweets, I extract these values.

Code

top_locs <- readr::read_rds("top_locs.rds")

aus_det <- aus %>%
  mutate(
    location = stringr::str_remove_all(stringr::str_to_lower(location), "#"),
    country = coords2country(
      tibble(
        lon = purrr::map_dbl(coords_coords, ~ .[1]),
        lat = purrr::map_dbl(coords_coords, ~ .[2])
      )
    )
  ) %>%
  left_join(top_locs, by = c("location" = "query")) %>%
  mutate(
    country = if_else(is.na(country), coords2country(select(., lon, lat)), country)
  ) %>%
  filter(
    !is.na(country) & 
      between(
        created_at, 
        lubridate::ymd_hms("2019-12-31 00:00:00"), 
        lubridate::ymd_hms("2020-01-07 23:59:59")
      )
    )

skimr::skim(aus_det)

Data summary
Name	aus_det
Number of rows	63562
Number of columns	13
_______________________
Column type frequency:
character	5
list	1
numeric	6
POSIXct	1
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique
user_id	1	2	19	0	47273
status_id	1	19	19	0	63042
location	1	0	68	37	5370
text	1	10	958	0	62887
country	1	4	32	0	167

Variable type: list

skim_variable	n_missing	complete_rate	n_unique	min_length	max_length
coords_coords	0	1	262	2	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
lat	174	1	15.51	33.28	-72.84	-24.78	30.72	41.89	77.62	▁▆▂▇▃
lon	174	1	1.44	95.45	-158.08	-81.46	-3.28	102.27	176.36	▅▇▆▂▇
lat_min	174	1	9.57	36.15	-85.05	-30.58	23.54	41.00	70.66	▂▆▃▇▇
lat_max	174	1	19.16	33.90	-60.00	-9.09	33.06	44.88	83.88	▂▃▂▇▂
lon_min	174	1	-9.09	96.78	-180.00	-84.64	-14.02	72.25	176.07	▃▇▅▃▅
lon_max	174	1	15.99	101.27	-157.92	-76.37	0.03	138.76	180.00	▃▇▆▂▇

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
created_at	0	1	2020-01-07 03:12:34	2020-01-07 23:59:59	2020-01-07 14:34:29	41894

You can also map these tweets on the map as follows:

Code

world <- map_data("world")

ggplot() +
  geom_polygon(data = world, aes(long, lat, group = group), 
               color = "black", fill = "lightgray", linewidth = 0.1) +
  geom_point(data = aus_det, aes(lon, lat), size = 0.1) + 
  coord_map(projection = "gilbert", ylim = c(85, -50), xlim = c(180, -180)) +
  xlab("") +
  ylab("") +
  hrbrthemes::theme_ipsum() +
  hrbrthemes::theme_ipsum(base_family = "Lato") +
  theme(
    panel.grid = element_blank(),
    axis.text = element_blank()
  )

So, yet you should extract emojis for these tweets.

Emojis extraction and analysis

To detect emojis in text, you need a dataset with emojis that contains Unicodes for each emoji.

Using this dataset, I created a regular expression to extract emojis from tweets.

The function extract_emojis returns a list with emojis used in each tweet.

Code

emoji <- readr::read_csv(
  "https://raw.githubusercontent.com/laurenancona/twimoji/gh-pages/twitterEmojiProject/emoticon_conversion_noGraphic.csv",
  col_names = F
) %>% slice(-1)

emoji_regex <- sprintf("(%s)", paste0(emoji$X2, collapse = "|"))
compiled <- ore(emoji_regex)

extract_emojis <- function(text_vector) {
  res <- vector(mode = "list", length = length(text_vector))

  where <- which(grepl(emoji_regex, text_vector, useBytes = TRUE))
  cat("detected items with emojis\n")
  chat_emoji_lines <- text_vector[where]

  found_emoji <- ore.search(compiled, chat_emoji_lines, all = TRUE)
  res[where] <- ore::matches(found_emoji)
  cat("created list with emojis\n")
  res
}

Let’s apply this function to your tweets:

Code

aus_emo <- aus_det %>%
  mutate(
    emoji = extract_emojis(text)
  ) %>%
  filter(!sapply(emoji, is.null)) %>%
  tidyr::unnest(emoji)

We detect 181624 emojis!

Code

skimr::skim(aus_emo)

Data summary
Name	aus_emo
Number of rows	181624
Number of columns	15
_______________________
Column type frequency:
character	6
list	1
numeric	7
POSIXct	1
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique
user_id	1	3	19	0	61944
status_id	1	19	19	0	83766
location	1	0	52	200	5791
text	1	1	963	0	82929
country	1	4	28	0	170
emoji	1	1	2	0	701

Variable type: list

skim_variable	n_missing	complete_rate	n_unique	min_length	max_length
coords_coords	0	1	517	2	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
retweet_count	0	1.00	9.33	331.02	0.00	0.00	0.00	1.00	42373.00	▇▁▁▁▁
lat	1022	0.99	14.00	32.48	-72.84	-24.78	22.35	40.97	77.62	▁▆▃▇▃
lon	1022	0.99	17.37	92.01	-169.86	-73.78	1.89	112.69	177.33	▃▇▇▃▇
lat_min	1022	0.99	8.28	35.70	-85.05	-28.26	18.44	40.31	68.55	▂▆▃▇▇
lat_max	1022	0.99	17.40	32.59	-60.00	-9.09	25.77	43.48	83.88	▂▃▃▇▂
lon_min	1022	0.99	6.80	93.20	-180.00	-75.56	-0.24	77.05	177.33	▂▇▆▆▆
lon_max	1022	0.99	28.53	96.17	-169.56	-66.85	6.41	124.05	180.00	▂▇▇▃▇

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
created_at	0	1	2019-12-31 00:00:08	2020-01-07 23:59:45	2020-01-05 05:47:06	77047

I want to map emojis in the center of each country. To do it, I need the coordinates of those. We can do it in the following way:

Code

centroids_df <- rworldmap::getMap(resolution = "high") %>%
  sf::st_as_sf() %>% 
  #rgeos::gCentroid(byid = TRUE) %>%
  sf::st_centroid() %>% 
  #as.data.frame(row.names = F) %>%
  as_tibble(rownames = "country")

skimr::skim(centroids_df)

Data summary
Name	centroids_df
Number of rows	253
Number of columns	53
_______________________
Column type frequency:
character	2
factor	36
numeric	15
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
country	0	1	4	40	0	253	0
geometry	0	1	35	39	0	253	0

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
ne_10m_adm	0	1.00	FALSE	253	ABW: 1, AFG: 1, AGO: 1, AIA: 1
FeatureCla	0	1.00	FALSE	1	Adm: 253
SOVEREIGNT	0	1.00	FALSE	204	Uni: 18, Fra: 9, Uni: 7, Aus: 6
SOV_A3	0	1.00	FALSE	205	GB1: 18, FR1: 9, US1: 7, AU1: 6
TYPE	0	1.00	FALSE	7	Sov: 188, Dep: 36, Cou: 20, Ind: 4
ADMIN	0	1.00	FALSE	253	Afg: 1, Akr: 1, Ala: 1, Alb: 1
ADM0_A3	0	1.00	FALSE	253	ABW: 1, AFG: 1, AGO: 1, AIA: 1
GEOUNIT	0	1.00	FALSE	253	Afg: 1, Akr: 1, Ala: 1, Alb: 1
GU_A3	0	1.00	FALSE	253	ABW: 1, AFG: 1, AGO: 1, AIA: 1
SUBUNIT	0	1.00	FALSE	253	Afg: 1, Akr: 1, Ala: 1, Alb: 1
SU_A3	0	1.00	FALSE	253	ABW: 1, AFG: 1, AGO: 1, AIA: 1
NAME	3	0.99	FALSE	250	Afg: 1, Akr: 1, Ala: 1, Alb: 1
ABBREV	3	0.99	FALSE	247	Ang: 2, S.L: 2, St.: 2, A.C: 1
POSTAL	3	0.99	FALSE	240	J: 3, AI: 2, AU: 2, CI: 2
NAME_FORMA	57	0.77	FALSE	196	Ara: 1, Arg: 1, Bai: 1, Bai: 1
TERR_	206	0.19	FALSE	15	U.K: 14, Fr.: 7, U.S: 4, Auz: 3
NAME_SORT	0	1.00	FALSE	253	Afg: 1, Akr: 1, Ala: 1, Alb: 1
ISO_A2	0	1.00	FALSE	237	-99: 15, AU: 2, PS: 2, AD: 1
ISO_A3	0	1.00	FALSE	253	ABW: 1, AFG: 1, AGO: 1, AIA: 1
ISO3	0	1.00	FALSE	253	ABW: 1, AFG: 1, AGO: 1, AIA: 1
ISO3.1	0	1.00	FALSE	253	ABW: 1, AFG: 1, AGO: 1, AIA: 1
ADMIN.1	0	1.00	FALSE	253	Afg: 1, Akr: 1, Ala: 1, Alb: 1
REGION	3	0.99	FALSE	7	Eur: 70, Afr: 57, Asi: 46, Sou: 44
continent	3	0.99	FALSE	6	Eur: 116, Afr: 57, Sou: 44, Aus: 27
GEO3major	3	0.99	FALSE	7	Eur: 70, Asi: 62, Afr: 57, Lat: 44
GEO3	3	0.99	FALSE	24	Wes: 40, Car: 23, Sou: 22, Cen: 21
IMAGE24	3	0.99	FALSE	26	Wes: 36, Res: 30, Oce: 27, Wes: 24
GLOCAF	4	0.98	FALSE	19	Eur: 61, Sub: 49, Res: 30, Oce: 27
Stern	4	0.98	FALSE	13	Eur: 70, Aus: 27, Sou: 25, Wes: 24
SRESmajor	4	0.98	FALSE	4	ALM: 114, OEC: 55, ASI: 50, REF: 30
SRES	4	0.98	FALSE	11	Sub: 49, Lat: 42, Wes: 42, Oth: 33
GBD	4	0.98	FALSE	21	Eur: 44, Car: 26, Oce: 22, Nor: 19
AVOIDname	4	0.98	FALSE	30	Eur: 51, Sou: 24, Wes: 24, Car: 23
LDC	3	0.99	FALSE	2	oth: 201, LDC: 49
SID	3	0.99	FALSE	2	oth: 200, SID: 50
LLDC	3	0.99	FALSE	2	oth: 219, LLD: 31

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
ScaleRank	0	1.00	1.49	1.07	0.00	1.00	1.00	1.00	5.000000e+00	▇▁▂▁▁
LabelRank	0	1.00	3.46	2.03	2.00	2.00	2.00	5.00	1.600000e+01	▇▅▁▁▁
OID_	0	1.00	138.25	74.16	10.00	76.00	139.00	202.00	2.650000e+02	▇▇▇▇▇
ADM0_DIF	0	1.00	0.20	0.40	0.00	0.00	0.00	0.00	1.000000e+00	▇▁▁▁▂
LEVEL	0	1.00	2.00	0.00	2.00	2.00	2.00	2.00	2.000000e+00	▁▁▇▁▁
GEOU_DIF	0	1.00	0.00	0.06	0.00	0.00	0.00	0.00	1.000000e+00	▇▁▁▁▁
SU_DIF	0	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.000000e+00	▁▁▇▁▁
MAP_COLOR	0	1.00	6.14	3.61	0.00	3.00	6.00	9.00	1.300000e+01	▅▇▅▅▅
POP_EST	2	0.99	27034804.24	116572130.87	0.00	151016.50	4203200.00	14939676.50	1.338613e+09	▇▁▁▁▁
GDP_MD_EST	0	1.00	275523.75	1134318.92	-99.00	1577.00	17820.00	107700.00	1.426000e+07	▇▁▁▁▁
FIPS_10_	0	1.00	-5.48	22.68	-99.00	0.00	0.00	0.00	0.000000e+00	▁▁▁▁▇
ISO_N3	0	1.00	402.81	274.90	-99.00	184.00	410.00	634.00	8.940000e+02	▆▇▇▇▆
LON	0	1.00	14.43	74.32	-176.16	-36.68	19.39	50.54	1.792100e+02	▁▃▇▃▂
LAT	0	1.00	17.37	26.24	-80.56	1.85	17.42	38.99	7.476000e+01	▁▂▆▇▃
AVOIDnumeric	4	0.98	22.61	6.50	1.00	21.00	25.00	27.00	3.000000e+01	▁▁▁▅▇

So let’s calculate top emoji by each country:

Code

top_emo <- aus_emo %>%
  group_by(country, emoji) %>%
  dplyr::count() %>%
  group_by(country) %>%
  top_n(1, wt = n) %>%
  dplyr::arrange(desc(n)) %>%
  left_join(centroids_df, by = "country") %>%
  group_by(country) %>%
  slice(1) %>%
  ungroup()

skimr::skim(top_emo)

Data summary
Name	top_emo
Number of rows	170
Number of columns	55
_______________________
Column type frequency:
character	3
factor	36
numeric	16
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
country	1	4	28	170
emoji	1	1	1	24
geometry	1	35	39	170

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
ne_10m_adm	0	1.00	FALSE	170	ABW: 1, AFG: 1, AGO: 1, ALB: 1
FeatureCla	0	1.00	FALSE	1	Adm: 170
SOVEREIGNT	0	1.00	FALSE	156	Uni: 6, Uni: 4, Chi: 2, Den: 2
SOV_A3	0	1.00	FALSE	156	GB1: 6, US1: 4, CH1: 2, DN1: 2
TYPE	0	1.00	FALSE	5	Sov: 145, Cou: 16, Dep: 7, Cou: 1
ADMIN	0	1.00	FALSE	170	Afg: 1, Alb: 1, Alg: 1, And: 1
ADM0_A3	0	1.00	FALSE	170	ABW: 1, AFG: 1, AGO: 1, ALB: 1
GEOUNIT	0	1.00	FALSE	170	Afg: 1, Alb: 1, Alg: 1, And: 1
GU_A3	0	1.00	FALSE	170	ABW: 1, AFG: 1, AGO: 1, ALB: 1
SUBUNIT	0	1.00	FALSE	170	Afg: 1, Alb: 1, Alg: 1, And: 1
SU_A3	0	1.00	FALSE	170	ABW: 1, AFG: 1, AGO: 1, ALB: 1
NAME	0	1.00	FALSE	170	Afg: 1, Alb: 1, Alg: 1, And: 1
ABBREV	0	1.00	FALSE	170	Afg: 1, Alb: 1, Alg: 1, And: 1
POSTAL	0	1.00	FALSE	166	J: 3, CN: 2, IS: 2, A: 1
NAME_FORMA	27	0.84	FALSE	143	Ara: 1, Arg: 1, Bai: 1, Bai: 1
TERR_	156	0.08	FALSE	10	Cro: 3, U.K: 2, U.S: 2, Ass: 1
NAME_SORT	0	1.00	FALSE	170	Afg: 1, Alb: 1, Alg: 1, And: 1
ISO_A2	0	1.00	FALSE	168	-99: 2, PS: 2, AD: 1, AE: 1
ISO_A3	0	1.00	FALSE	170	ABW: 1, AFG: 1, AGO: 1, ALB: 1
ISO3	0	1.00	FALSE	170	ABW: 1, AFG: 1, AGO: 1, ALB: 1
ISO3.1	0	1.00	FALSE	170	ABW: 1, AFG: 1, AGO: 1, ALB: 1
ADMIN.1	0	1.00	FALSE	170	Afg: 1, Alb: 1, Alg: 1, And: 1
REGION	0	1.00	FALSE	7	Eur: 54, Asi: 39, Afr: 36, Sou: 32
continent	0	1.00	FALSE	6	Eur: 93, Afr: 36, Sou: 32, Aus: 6
GEO3major	0	1.00	FALSE	7	Eur: 54, Afr: 36, Asi: 34, Lat: 32
GEO3	0	1.00	FALSE	24	Wes: 27, Cen: 19, Sou: 13, Car: 11
IMAGE24	0	1.00	FALSE	26	Wes: 23, Cen: 19, Res: 18, Mid: 15
GLOCAF	1	0.99	FALSE	19	Eur: 46, Sub: 30, Res: 18, Mid: 12
Stern	1	0.99	FALSE	13	Eur: 54, Sou: 17, Eas: 16, Sou: 13
SRESmajor	1	0.99	FALSE	4	ALM: 81, OEC: 35, ASI: 27, REF: 26
SRES	1	0.99	FALSE	11	Lat: 30, Sub: 30, Wes: 28, Mid: 21
GBD	1	0.99	FALSE	21	Eur: 30, Nor: 18, Car: 14, Eur: 13
AVOIDname	1	0.99	FALSE	30	Eur: 44, Sou: 16, Wes: 13, Sou: 12
LDC	0	1.00	FALSE	2	oth: 141, LDC: 29
SID	0	1.00	FALSE	2	oth: 149, SID: 21
LLDC	0	1.00	FALSE	2	oth: 145, LLD: 25

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
n	0	1.00	182.07	618.11	1.00	6.25	24.50	81.00	5.524000e+03	▇▁▁▁▁
ScaleRank	0	1.00	1.16	0.58	1.00	1.00	1.00	1.00	4.000000e+00	▇▁▁▁▁
LabelRank	0	1.00	2.82	1.51	2.00	2.00	2.00	2.75	8.000000e+00	▇▁▂▁▁
OID_	0	1.00	149.66	70.81	10.00	97.25	153.50	205.75	2.650000e+02	▃▆▇▇▇
ADM0_DIF	0	1.00	0.09	0.28	0.00	0.00	0.00	0.00	1.000000e+00	▇▁▁▁▁
LEVEL	0	1.00	2.00	0.00	2.00	2.00	2.00	2.00	2.000000e+00	▁▁▇▁▁
GEOU_DIF	0	1.00	0.01	0.08	0.00	0.00	0.00	0.00	1.000000e+00	▇▁▁▁▁
SU_DIF	0	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.000000e+00	▁▁▇▁▁
MAP_COLOR	0	1.00	6.08	3.59	0.00	3.00	6.00	9.00	1.300000e+01	▅▇▅▆▅
POP_EST	0	1.00	38956131.21	140103771.04	1398.00	2724850.50	9047593.50	27408216.00	1.338613e+09	▇▁▁▁▁
GDP_MD_EST	0	1.00	408635.88	1365376.27	0.00	13585.00	55205.00	246600.00	1.426000e+07	▇▁▁▁▁
FIPS_10_	0	1.00	-0.58	7.59	-99.00	0.00	0.00	0.00	0.000000e+00	▁▁▁▁▇
ISO_N3	0	1.00	418.33	263.84	-99.00	197.75	409.00	640.00	8.940000e+02	▅▆▇▆▇
LON	0	1.00	16.95	62.31	-169.87	-8.10	20.63	46.45	1.779800e+02	▁▃▇▃▁
LAT	0	1.00	21.44	26.33	-80.56	6.70	23.89	41.72	7.476000e+01	▁▂▃▇▃
AVOIDnumeric	1	0.99	22.80	5.92	1.00	21.00	25.00	26.00	3.000000e+01	▁▁▁▅▇

You can download images with these emojis from emojipedia (we need images to visualize them on the map). I did it using functions from the rvest package. The function emoji_to_link gets a URL for each emoji that you can use to download and visualize images of emojis. The function link_to_img provides a conversion path of each downloaded image to markdown format that I use in the plot.

Code

top_emo$x <- (top_emo$geometry %>% sf::st_coordinates())[,1]
top_emo$y <- (top_emo$geometry %>% sf::st_coordinates())[,2]
#emoji_to_link <- function(x) {
#  paste0("https://emojipedia.org/emoji/", x) %>%
#    read_html() %>%
#    html_nodes("tr td a") %>%
#    .[1] %>%
#    html_attr("href") %>%
#    paste0("https://emojipedia.org/", .) %>%
#    read_html() %>%
#    html_node('div[class="vendor-image"] img') %>%
#    html_attr("src")
#}
#
#link_to_img <- function(x, size = 25) {
#  paste0("<img src='", x, "' width='", size, "'/>")
#}

#emo <- top_emo %>%
#  distinct(emoji) %>%
#  mutate(
#    url = purrr::map_chr(emoji, purrr::slowly(~ emoji_to_link(.x), #purrr::rate_delay(1))),
#    label = link_to_img(paste0("emoji/", basename(unique(url))))
#  )
#
#top_emo <- top_emo %>% left_join(emo, by = "emoji")
#
#if (!dir.exists("emoji")) dir.create("emoji")
#
#p <- purrr::map2(emo$url, paste0("emoji/", basename(emo$url)), #download.file)

#skimr::skim(top_emo)

Thus we have all we needed for visualization.

Visualization

Let’s create the map using functions from ggplot2 package:

Code

ggplot() +
  geom_polygon(data = world, aes(long, lat, group = group), color = "black", fill = "lightgray", linewidth = 0.1) +
  geom_richtext(
    data = top_emo %>% ungroup(),# %>%
      #mutate(label = stringr::str_replace_all(label, "'25'", paste0("'", round(log1p(top_emo$n) * 3), "'"))),
    aes(x, y, label = emoji), fill = NA, label.color = NA, label.padding = grid::unit(rep(0, 4), "pt"), family="EmojiOne"
  ) +
  coord_map(projection = "gilbert", ylim = c(85, -50), xlim = c(180, -180)) +
  xlab("") +
  ylab("") +
  labs(
    title = "<img src='https://em-content.zobj.net/thumbs/320/twitter/348/flag-australia_1f1e6-1f1fa.png' width='35'/> Australia bushfires in emojis",
    subtitle = "<br/>Emoji is basically like another language: it has its own rules, 
    it can cover anything that comes to one's mind, <br/>
you can build whole sentences using only those tiny faces and other symbols. While people were praying <br/>
for Australia on Twitter they used plenty of emojis as well, but only few of them were the most common ones.<br/>

The map below shows the emojis used most frequently by country in tweets with hashtags 
<span style='color:blue'>#prayforaustralia</span>, <br/>
<span style='color:blue'>#australiaonfire</span>, <span style='color:blue'>#australiafires</span>, 
<span style='color:blue'>#australia</span>, <span style='color:blue'>#australianbushfire</span>, 
<span style='color:blue'>#australianfires</span>, <span style='color:blue'>#australiaburning</span>, 
<span style='color:blue'>#australiaburns</span>, <br/> <span style='color:blue'>#pray4australia</span>, 
<span style='color:blue'>#australiabushfires</span>, <span style='color:blue'>#prayforrain</span><br>",
    caption = glue::glue("Data: twitter.com, {format(n_distinct(aus_emo, 'status_id'), big.mark = ' ')} tweets with marked location")
  ) +
  hrbrthemes::theme_ipsum(base_family = "Lato") +
  theme(
    panel.grid = element_blank(),
    axis.text = element_blank(),
    plot.title = element_markdown(size = 35, face = "bold", colour = "black", vjust = -1),
    plot.subtitle = element_markdown(size = 18, vjust = -1, lineheight = 1.1)
  )

I also used the ggtext package to make markdown formatting in text labels, ggalt package that provides Winkel tripel map projection and hrbrthemes that provides awesome themes for ggplots.

Conclusion

Hurray, we have the map of top emoji by each country on Twitter!

Here I want to make a summary describing packages that I used to do this analysis.

I used the following libraries:

rtweet - for access to Twitter API;
ore - to create regular expressions;
dplyr - to manipulate data;
ggplot2 - to make pretty map;
ggtext - to make markdown labels in ggplot2;
rvest - for web-scraping;
readr - to read data from files;
skimr - to make beatiful data-summary;
stringr - for text transformations;
tmaptools - to search coordinates by location name;
rworldmap - to load world map;
sp - for maps manipulations;
purrr - package provides functional programming in R;
lubridate - for date conversions;
ggalt - to make Winkel tripel map projection;
hrbrthemes - to make pretty ggplot2 themes;
glue - for better text formatting.

All calcutaion made in R version 3.5.2.

Citation

BibTeX citation:

@online{kyrychenko2020,
  author = {Kyrychenko, Roman},
  title = {Twitter-Emojis Analysis. {Australian} Bushfires Case},
  date = {2020-01-15},
  url = {https://randomforest.run/posts/twitter-emojis-analysis/twitter-emojis-analysis.html},
  langid = {en}
}

For attribution, please cite this work as:

Kyrychenko, Roman. 2020. “Twitter-Emojis Analysis. Australian Bushfires Case.” January 15, 2020. https://randomforest.run/posts/twitter-emojis-analysis/twitter-emojis-analysis.html.