telegramR: Scraping Telegram Channels with R

telegramR

data collection

social media

NLP

text analysis

A practical guide to telegramR — an R client for Telegram’s MTProto API that lets you download messages, reactions, members, and media from any public channel directly into tibbles.

Author

Roman Kyrychenko

Published

March 31, 2026

Why Telegram?

Telegram has become one of the most significant platforms for political communication, grassroots organising, and — unfortunately — disinformation. With more than 900 million monthly active users and a liberal API policy, it is a goldmine for social scientists, computational linguists, and data journalists alike.

Until recently, R researchers had to call Python’s Telethon through reticulate, juggle environment issues, and wrestle with type conversions. telegramR fixes that: a native R package that speaks Telegram’s MTProto binary protocol directly, with no Python dependency.

remotes::install_github("RomanKyrychenko/telegramR")

Getting API Credentials

Before writing any R code you need a pair of credentials from Telegram:

Log in at https://my.telegram.org.
Go to API development tools.
Create an application — the name and platform don’t matter for research use.
Copy api_id (a number) and api_hash (a hex string).

Keep these values secret; treat api_hash like a password.

Connecting and Logging In

library(telegramR)

client <- TelegramClient$new(
  session  = "research_session",   # saved to disk; reuse across runs
  api_id   = Sys.getenv("TG_API_ID"),
  api_hash = Sys.getenv("TG_API_HASH")
)

client$start()   # interactive: phone → SMS code → 2FA (if set)

The session file persists your authorisation key. Delete it to force a fresh login.

For non-interactive pipelines (CI, scheduled jobs) you can pass everything explicitly:

client$start(
  phone         = "+15551234567",
  code_callback = function() readline("Code: "),
  password      = Sys.getenv("TG_2FA")
)

Downloading Channel Messages

The workhorse function is download_channel_messages(). Pass a username (without @) or a numeric channel ID:

msgs <- download_channel_messages(
  client,
  channel = "bbcnews",
  limit   = 500
)

dplyr::glimpse(msgs)

The returned tibble has a rich schema:

Column	Description
`message_id`	Unique message identifier
`date`	Timestamp (UTC)
`text`	Full message text
`views`	View count at download time
`forwards`	Number of times forwarded
`replies`	Reply count
`reactions_total`	Total emoji reactions
`reactions_json`	Per-emoji breakdown (JSON)
`media_type`	photo / video / document / …
`is_forward`	Whether the post was forwarded
`forward_from_name`	Original channel name
`channel_title`	Display name of scraped channel

Filtering by Date

Avoid downloading years of history when you only care about a specific window:

msgs_jan <- download_channel_messages(
  client,
  channel    = "bbcnews",
  start_date = "2025-01-01",
  end_date   = "2025-01-31",
  limit      = Inf           # fetch everything in the window
)

Estimating Volume Before Downloading

Before pulling a large channel, check how much data you’re dealing with:

estimate_channel_post_count(client, "bbcnews")
#> [1] 48301

This returns an upper-bound estimate without downloading any messages.

Channel Metadata

info <- download_channel_info(client, "bbcnews")

# Returns: id, title, username, description, member_count, creation_date

Reactions, Replies and Members

# Per-message reactions with emoji breakdown
reactions <- download_channel_reactions(client, "bbcnews", limit = 1000)

# Replies to the most recent posts
replies <- download_channel_replies(
  client,
  channel       = "bbcnews",
  message_limit = 100     # look at last 100 posts
)

# Public subscriber list (where available)
members <- download_channel_members(client, "bbcnews", limit = 5000)

Downloading Media

dir.create("media", showWarnings = FALSE)

media_index <- download_channel_media(
  client,
  channel     = "bbcnews",
  limit       = 200,
  media_types = c("photo", "video"),
  start_date  = "2025-01-01",
  end_date    = "2025-02-01",
  out_dir     = "media"
)

head(media_index)

The function returns a tibble with the local file path alongside message metadata, so you can join it back to msgs by message_id.

A Mini Analysis: Reaction Trends

Here’s a small end-to-end snippet — download a week of posts, parse reactions, and plot engagement over time:

library(telegramR)
library(dplyr)
library(tidyr)
library(ggplot2)
library(jsonlite)

msgs <- download_channel_messages(
  client, "bbcnews",
  start_date = "2025-03-01", end_date = "2025-03-07",
  limit = Inf
)

# Parse reactions JSON
reactions_long <- msgs |>
  filter(!is.na(reactions_json)) |>
  mutate(
    emoji_data = lapply(reactions_json, function(x) fromJSON(x))
  ) |>
  unnest_wider(emoji_data) |>
  pivot_longer(cols = -c(message_id, date), names_to = "emoji", values_to = "count") |>
  filter(!is.na(count))

# Plot top-5 reactions by day
reactions_long |>
  mutate(day = as.Date(date)) |>
  group_by(day, emoji) |>
  summarise(total = sum(count), .groups = "drop") |>
  slice_max(total, n = 5, by = day) |>
  ggplot(aes(day, total, fill = emoji)) +
  geom_col(position = "dodge") +
  labs(
    title  = "Daily Telegram Reactions — BBC News",
    x      = NULL,
    y      = "Reaction count",
    fill   = "Emoji"
  ) +
  theme_minimal()

Practical Tips

Rate limits — Telegram throttles heavy scrapers. Add Sys.sleep(1) between calls when downloading large histories.
Session reuse — the session file caches the authorisation key. Store it safely; don’t commit it to git.
Async returns — most low-level helpers return future objects. Unwrap them with future::value() if you call them directly.
Debug logging — suppress verbose output with:

options(
  telegramR.debug_pump    = FALSE,
  telegramR.debug_process = FALSE,
  telegramR.debug_parse   = FALSE
)

Summary

telegramR brings full MTProto client functionality to R without any Python bridge. Whether you’re building a disinformation monitor, studying political communication, or just curious about a niche community, the package gives you clean tibbles ready for tidyverse pipelines — from raw channel scrape to publication-ready analysis entirely in R.

Source: https://github.com/RomanKyrychenko/telegramR

--- title: "telegramR: Scraping Telegram Channels with R" description: | A practical guide to telegramR — an R client for Telegram's MTProto API that lets you download messages, reactions, members, and media from any public channel directly into tibbles. author: - name: Roman Kyrychenko url: https://www.linkedin.com/in/kirichenko17roman/ date: 2026-03-31 format: html: toc: true message: false warning: false echo: true code-tools: true twitter: creator: "@KyrychenkoRoman" categories: - R - telegramR - data collection - social media - NLP - text analysis creative_commons: CC BY image: "preview.svg" --- ## Why Telegram? Telegram has become one of the most significant platforms for political communication, grassroots organising, and — unfortunately — disinformation. With more than 900 million monthly active users and a liberal API policy, it is a goldmine for social scientists, computational linguists, and data journalists alike. Until recently, R researchers had to call Python's [Telethon](https://github.com/LonamiWebs/Telethon) through `reticulate`, juggle environment issues, and wrestle with type conversions. **`telegramR`** fixes that: a native R package that speaks Telegram's MTProto binary protocol directly, with no Python dependency. ```r remotes::install_github("RomanKyrychenko/telegramR") ``` ## Getting API Credentials Before writing any R code you need a pair of credentials from Telegram: 1. Log in at <https://my.telegram.org>. 2. Go to **API development tools**. 3. Create an application — the name and platform don't matter for research use. 4. Copy `api_id` (a number) and `api_hash` (a hex string). Keep these values secret; treat `api_hash` like a password. ## Connecting and Logging In ```r library(telegramR) client <- TelegramClient$new( session = "research_session", # saved to disk; reuse across runs api_id = Sys.getenv("TG_API_ID"), api_hash = Sys.getenv("TG_API_HASH") ) client$start() # interactive: phone → SMS code → 2FA (if set) ``` The session file persists your authorisation key. Delete it to force a fresh login. For non-interactive pipelines (CI, scheduled jobs) you can pass everything explicitly: ```r client$start( phone = "+15551234567", code_callback = function() readline("Code: "), password = Sys.getenv("TG_2FA") ) ``` ## Downloading Channel Messages The workhorse function is `download_channel_messages()`. Pass a username (without `@`) or a numeric channel ID: ```r msgs <- download_channel_messages( client, channel = "bbcnews", limit = 500 ) dplyr::glimpse(msgs) ``` The returned tibble has a rich schema: | Column | Description | |---|---| | `message_id` | Unique message identifier | | `date` | Timestamp (UTC) | | `text` | Full message text | | `views` | View count at download time | | `forwards` | Number of times forwarded | | `replies` | Reply count | | `reactions_total` | Total emoji reactions | | `reactions_json` | Per-emoji breakdown (JSON) | | `media_type` | photo / video / document / … | | `is_forward` | Whether the post was forwarded | | `forward_from_name` | Original channel name | | `channel_title` | Display name of scraped channel | ### Filtering by Date Avoid downloading years of history when you only care about a specific window: ```r msgs_jan <- download_channel_messages( client, channel = "bbcnews", start_date = "2025-01-01", end_date = "2025-01-31", limit = Inf # fetch everything in the window ) ``` ### Estimating Volume Before Downloading Before pulling a large channel, check how much data you're dealing with: ```r estimate_channel_post_count(client, "bbcnews") #> [1] 48301 ``` This returns an upper-bound estimate without downloading any messages. ## Channel Metadata ```r info <- download_channel_info(client, "bbcnews") # Returns: id, title, username, description, member_count, creation_date ``` ## Reactions, Replies and Members ```r # Per-message reactions with emoji breakdown reactions <- download_channel_reactions(client, "bbcnews", limit = 1000) # Replies to the most recent posts replies <- download_channel_replies( client, channel = "bbcnews", message_limit = 100 # look at last 100 posts ) # Public subscriber list (where available) members <- download_channel_members(client, "bbcnews", limit = 5000) ``` ## Downloading Media ```r dir.create("media", showWarnings = FALSE) media_index <- download_channel_media( client, channel = "bbcnews", limit = 200, media_types = c("photo", "video"), start_date = "2025-01-01", end_date = "2025-02-01", out_dir = "media" ) head(media_index) ``` The function returns a tibble with the local file path alongside message metadata, so you can join it back to `msgs` by `message_id`. ## A Mini Analysis: Reaction Trends Here's a small end-to-end snippet — download a week of posts, parse reactions, and plot engagement over time: ```r library(telegramR) library(dplyr) library(tidyr) library(ggplot2) library(jsonlite) msgs <- download_channel_messages( client, "bbcnews", start_date = "2025-03-01", end_date = "2025-03-07", limit = Inf ) # Parse reactions JSON reactions_long <- msgs |> filter(!is.na(reactions_json)) |> mutate( emoji_data = lapply(reactions_json, function(x) fromJSON(x)) ) |> unnest_wider(emoji_data) |> pivot_longer(cols = -c(message_id, date), names_to = "emoji", values_to = "count") |> filter(!is.na(count)) # Plot top-5 reactions by day reactions_long |> mutate(day = as.Date(date)) |> group_by(day, emoji) |> summarise(total = sum(count), .groups = "drop") |> slice_max(total, n = 5, by = day) |> ggplot(aes(day, total, fill = emoji)) + geom_col(position = "dodge") + labs( title = "Daily Telegram Reactions — BBC News", x = NULL, y = "Reaction count", fill = "Emoji" ) + theme_minimal() ``` ## Practical Tips - **Rate limits** — Telegram throttles heavy scrapers. Add `Sys.sleep(1)` between calls when downloading large histories. - **Session reuse** — the session file caches the authorisation key. Store it safely; don't commit it to git. - **Async returns** — most low-level helpers return `future` objects. Unwrap them with `future::value()` if you call them directly. - **Debug logging** — suppress verbose output with: ```r options( telegramR.debug_pump = FALSE, telegramR.debug_process = FALSE, telegramR.debug_parse = FALSE ) ``` ## Summary `telegramR` brings full MTProto client functionality to R without any Python bridge. Whether you're building a disinformation monitor, studying political communication, or just curious about a niche community, the package gives you clean tibbles ready for `tidyverse` pipelines — from raw channel scrape to publication-ready analysis entirely in R. **Source**: <https://github.com/RomanKyrychenko/telegramR>