Title: | Utilities to Extract and Analyse Text Data from the Emergency Nutrition Network Forum |
---|---|
Description: | The Emergency Nutrition Network or en-net forum is the go to online forum for field practitioners requiring prompt technical advice for operational challenges for which answers are not readily accessible in current guidelines. The questions and the corresponding answers raised within en-net can provide insight into what the key topics of discussion are within the nutrition sector. This package provides utility functions for the extraction, processing and analysis of text data from the online forum. |
Authors: | Ernest Guevarra [aut, cre] |
Maintainer: | Ernest Guevarra <[email protected]> |
License: | GPL-3 |
Version: | 0.3.0.9000 |
Built: | 2024-10-11 05:00:07 UTC |
Source: | https://github.com/katilingban/ennet |
Arrange topics based on number of replies
arrange_replies( topics, by_theme = TRUE, by_date = c("month_year", "year", "all") )
arrange_replies( topics, by_theme = TRUE, by_date = c("month_year", "year", "all") )
topics |
A tibble of topics by theme from en-net forum produced
through a call to |
by_theme |
Logical. Should topics be grouped by theme? Default is TRUE. |
by_date |
Should topics be grouped by month of the year or just by year? Default is to group by month of the year. |
A tibble of topic views by theme and by specified date format arranged in descending order
Ernest Guevarra
library(magrittr) ennet_topics %>% arrange_replies()
library(magrittr) ennet_topics %>% arrange_replies()
Arrange topics based on number of views
arrange_views( topics, by_theme = TRUE, by_date = c("month_year", "year", "all") )
arrange_views( topics, by_theme = TRUE, by_date = c("month_year", "year", "all") )
topics |
A tibble of topics by theme from en-net forum produced
through a call to |
by_theme |
Logical. Should topics be grouped by theme? Default is TRUE. |
by_date |
Should topics be grouped by month of the year or just by year or overall? Default is to group by month of the year. |
A tibble of topic views by theme and by specified date format arranged in descending order
Ernest Guevarra
library(magrittr) ennet_topics %>% arrange_views()
library(magrittr) ennet_topics %>% arrange_views()
Count the number of topics by author, by theme and by date
count_authors( topics, by_theme = TRUE, by_date = c("month_year", "year", "all"), .sort = TRUE )
count_authors( topics, by_theme = TRUE, by_date = c("month_year", "year", "all"), .sort = TRUE )
topics |
A tibble of topics by theme from en-net forum produced
through a call to |
by_theme |
Logical. If TRUE (default), count by theme. |
by_date |
Should topics be counted by month of the year or just by year or total? Default is to count by month of the year. |
.sort |
Logical. Should output be sorted by count frequencies? Default is TRUE. |
A tibble of topic counts by author, by theme and by specified date format
Ernest Guevarra
library(magrittr) ennet_topics %>% count_authors()
library(magrittr) ennet_topics %>% count_authors()
Count the number of topics by theme and by date
count_topics(topics, by_date = c("month_year", "year", "all"), .sort = TRUE)
count_topics(topics, by_date = c("month_year", "year", "all"), .sort = TRUE)
topics |
A tibble of topics by theme from en-net forum produced
through a call to |
by_date |
Should topics be grouped by month of the year or just by year or overall? Default is to group by month of the year. |
.sort |
Logical. Should output be sorted by count frequencies? Default is TRUE. |
A tibble of topic counts by theme and by specified date format
Ernest Guevarra
library(magrittr) ennet_topics %>% count_topics(by_date = "month_year")
library(magrittr) ennet_topics %>% count_topics(by_date = "month_year")
Count number of questions/topics posted on en-net by author
count_topics_author(topics = get_themes_topics(), .sort = TRUE) count_topics_author_time( topics = get_themes_topics(), by_time = c("day", "week", "month", "year"), .sort = TRUE )
count_topics_author(topics = get_themes_topics(), .sort = TRUE) count_topics_author_time( topics = get_themes_topics(), by_time = c("day", "week", "month", "year"), .sort = TRUE )
topics |
A tibble of topics by theme, by author, and by posting date
from en-net forum produced through a call to |
.sort |
Logical. Should output be sorted by count frequencies? Default is TRUE |
by_time |
Should topics be counted by day, by week, by month or by year? Default is to count by day. |
A tibble of topic counts by specified grouping
Ernest Guevarra
## Get counts of topics by author count_topics_author(topics = ennet_topics) ## Get counts of authors by author and by time count_topics_author_time(topics = ennet_topics)
## Get counts of topics by author count_topics_author(topics = ennet_topics) ## Get counts of authors by author and by time count_topics_author_time(topics = ennet_topics)
Count number of questions/topics posted on en-net by time
count_topics_day(topics = get_themes_topics(), .sort = FALSE) count_topics_week(topics = get_themes_topics(), .sort = FALSE) count_topics_month(topics = get_themes_topics(), .sort = FALSE) count_topics_year(topics = get_themes_topics(), .sort = FALSE)
count_topics_day(topics = get_themes_topics(), .sort = FALSE) count_topics_week(topics = get_themes_topics(), .sort = FALSE) count_topics_month(topics = get_themes_topics(), .sort = FALSE) count_topics_year(topics = get_themes_topics(), .sort = FALSE)
topics |
A tibble of topics by theme, by author, and by posting date
from en-net forum produced through a call to |
.sort |
Logical. Should output be sorted by count frequencies? Default is FALSE |
A tibble of topic counts by specified time grouping
Ernest Guevarra
## Get counts of topics by day count_topics_day(topics = ennet_topics) ## Get counts of topics by week count_topics_week(topics = ennet_topics) ## Get counts of topics by month count_topics_month(topics = ennet_topics) ## Get counts of topics by year count_topics_year(topics = ennet_topics)
## Get counts of topics by day count_topics_day(topics = ennet_topics) ## Get counts of topics by week count_topics_week(topics = ennet_topics) ## Get counts of topics by month count_topics_month(topics = ennet_topics) ## Get counts of topics by year count_topics_year(topics = ennet_topics)
Count number of questions/topics posted on en-net
count_topics_theme(topics = get_themes_topics(), .sort = TRUE) count_topics_theme_time( topics = get_themes_topics(), by_time = c("day", "week", "month", "year"), .sort = TRUE )
count_topics_theme(topics = get_themes_topics(), .sort = TRUE) count_topics_theme_time( topics = get_themes_topics(), by_time = c("day", "week", "month", "year"), .sort = TRUE )
topics |
A tibble of topics by theme, by author, and by posting date
from en-net forum produced through a call to |
.sort |
Logical. Should output be sorted by count frequencies? Default is TRUE |
by_time |
Should topics be counted by day, by week, by month or by year? Default is to count by day. |
A tibble of topic counts by specified grouping
Ernest Guevarra
## Get counts of topics by theme count_topics_theme(topics = ennet_topics) ## Get counts of topics by theme and by time count_topics_theme_time(topics = ennet_topics)
## Get counts of topics by theme count_topics_theme(topics = ennet_topics) ## Get counts of topics by theme and by time count_topics_theme_time(topics = ennet_topics)
Create daily topics datasets for the ennet_db
create_db_topics_dailies(hourlies)
create_db_topics_dailies(hourlies)
hourlies |
A tibble of topics data usually produced by using the
|
A tibble of specified topics dataset created from data in the ennet_db
Ernest Guevarra
themes <- ennet_themes$themes x <- ennet_hourlies[ennet_hourlies$Theme == themes[3], ] create_db_topics_dailies(hourlies = x)
themes <- ennet_themes$themes x <- ennet_hourlies[ennet_hourlies$Theme == themes[3], ] create_db_topics_dailies(hourlies = x)
Create daily topics dataset for the ennet_db
create_db_topics_daily( repo = "katilingban/ennet_db", branch = "main", .date = Sys.Date() - 1, fn = NULL )
create_db_topics_daily( repo = "katilingban/ennet_db", branch = "main", .date = Sys.Date() - 1, fn = NULL )
repo |
A character value of the GitHub user and repository name
combination identifying the GitHub location for ennet_db. Default is
|
branch |
A character value for the branch name from which to retrieve
data. Default is |
.date |
A character value or vector of date/dates for which to create a topics dataset for the ennet_db |
fn |
A character value or vector of filenames for hourly topics dataset found in ennet_db |
A tibble of daily topics dataset created from data in the ennet_db
Ernest Guevarra
## fn <- c("ennet_topics_2021-01-17_00:54:48.csv") create_db_topics_daily(.date = "2021-01-17", fn = fn)
## fn <- c("ennet_topics_2021-01-17_00:54:48.csv") create_db_topics_daily(.date = "2021-01-17", fn = fn)
Create hourly topics datasets for the ennet_db
create_db_topics_hourlies( repo = "katilingban/ennet_db", branch = "main", .date = Sys.Date() )
create_db_topics_hourlies( repo = "katilingban/ennet_db", branch = "main", .date = Sys.Date() )
repo |
A character value of the GitHub user and repository name
combination identifying the GitHub location for ennet_db. Default is
|
branch |
A character value for the branch name from which to retrieve
data. Default is |
.date |
A character value or vector of date/dates for which to create a topics dataset for the ennet_db |
A tibble of specified topics dataset created from data in the ennet_db
Ernest Guevarra
create_db_topics_hourlies(.date = "2020-12-31")
create_db_topics_hourlies(.date = "2020-12-31")
Create various topics interactions datasets for the ennet_db
create_db_topics_interactions( dailies, id = c("daily", "weekly", "monthly", "yearly") )
create_db_topics_interactions( dailies, id = c("daily", "weekly", "monthly", "yearly") )
dailies |
A tibble of topics data usually produced by using the
|
id |
A character value for data identifier. Possible choices are daily, weekly, monthly, or yearly. |
A tibble of specified topics dataset created from data in the ennet_db
Ernest Guevarra
themes <- ennet_themes$themes x <- ennet_dailies[ennet_dailies$Theme == themes[3], ] create_db_topics_interactions(dailies = x, id = "yearly")
themes <- ennet_themes$themes x <- ennet_dailies[ennet_dailies$Theme == themes[3], ] create_db_topics_interactions(dailies = x, id = "yearly")
Create monthly topics dataset for the ennet_db
create_db_topics_monthly( repo = "katilingban/ennet_db", branch = "main", .date = Sys.Date() - 1 )
create_db_topics_monthly( repo = "katilingban/ennet_db", branch = "main", .date = Sys.Date() - 1 )
repo |
A character value of the GitHub user and repository name
combination identifying the GitHub location for ennet_db. Default is
|
branch |
A character value for the branch name from which to retrieve
data. Default is |
.date |
A character value or vector of date/dates for which to create a topics dataset for the ennet_db |
A tibble of monthly topics dataset created from data in the ennet_db
Ernest Guevarra
create_db_topics_monthly(.date = "2021-01-01")
create_db_topics_monthly(.date = "2021-01-01")
Daily extracts of topics dataset from en-net online forum
ennet_dailies
ennet_dailies
A tibble with 90 rows and 7 columns:
Variable | Description |
Variable | Description |
:--- | :--- |
Theme |
Thematic areas in the en-net forum |
Topic |
Short description of the topic/question being discussed/raised |
Author |
Name of person who raised the topic/question |
Posted |
Date topic/question was posted on en-net forum |
Link |
URL of the topic/question being discussed/raised |
Interaction |
Type of interaction. Either Views or Replies |
Extraction |
Date and time when data was extracted |
n |
Number or count |
Extraction Date |
Date when data was extracted |
https://www.en-net.org
ennet_dailies
ennet_dailies
Hourly extracts of topics dataset from en-net online forum
ennet_hourlies
ennet_hourlies
A tibble with 643844 rows and 8 columns:
Variable | Description |
Theme |
Thematic areas in the en-net forum |
Topic |
Short description of the topic/question being discussed/raised |
Author |
Name of person who raised the topic/question |
Posted |
Date topic/question was posted on en-net forum |
Link |
URL of the topic/question being discussed/raised |
Interaction |
Type of interaction. Either Views or Replies |
Extraction |
Date and time when data was extracted |
n |
Number or count |
https://www.en-net.org
ennet_hourlies
ennet_hourlies
Themes from en-net forum retrieved on 17 January 2021
ennet_themes
ennet_themes
A tibble with 18 rows and 2 columns:
Variable | Description |
themes |
Thematic areas in the en-net forum |
links |
URL of the thematic area |
https://www.en-net.org
ennet_themes
ennet_themes
Topics from en-net forum retrieved on 17 January 2021
ennet_topics
ennet_topics
A tibble with 3045 rows and 7 columns:
Variable | Description |
Theme |
Thematic areas in the en-net forum |
Topic |
Short description of the topic/question being discussed/raised |
Views |
Number of views of the topic/question being discussed/raised |
Author |
Name of person who raised the topic/question |
Posted |
Date topic/question was posted on en-net forum |
Link |
URL of the topic/question being discussed/raised |
Replies |
Number of replies to topic/question being discussed/raised |
Please note that this dataset is made available in the package
primarily as a guide for the user and as testing data for the code. Users
are advised not to use this dataset for actual analysis or reporting and
instead make a call to get_theme_topics()
or to get_themes_topics()
.
This is because some topics and some author names have been converted to NA
as they contain non-ASCII characters which are not allowed as a text
encoding format for packaged data.
https://www.en-net.org
ennet_topics
ennet_topics
These functions still work but will be removed (defunct) in the next version
of ennet
.
Function | Notes |
count_topics() |
Please use count_topics_theme() instead |
count_authors() |
Please use count_topics_author() instead |
Retrieve data from ennet_db GitHub repository
get_db_discussions(repo = "katilingban/ennet_db", branch = "main") get_db_topics( repo = "katilingban/ennet_db", branch = "main", id = c("daily", "weekly", "monthly", "yearly") )
get_db_discussions(repo = "katilingban/ennet_db", branch = "main") get_db_topics( repo = "katilingban/ennet_db", branch = "main", id = c("daily", "weekly", "monthly", "yearly") )
repo |
A character value for the GitHub user and repository name
combination identifying the GitHub location for ennet_db. Default is
|
branch |
A character value for the branch name from which to retrieve
data. Default is |
id |
A character value for data identifier. Possible choices are daily, weekly, monthly, or yearly. |
A tibble of the specified dataset
Ernest Guevarra
## Retrieve discussions dataset get_db_discussions() ## Retrieve en-net topics yearly interactions dataset get_db_topics(id = "yearly")
## Retrieve discussions dataset get_db_discussions() ## Retrieve en-net topics yearly interactions dataset get_db_topics(id = "yearly")
Get theme topics
get_theme_topics(link)
get_theme_topics(link)
link |
URL of a specific thematic area |
A tibble of all topics for the specified thematic area.
themes <- ennet_themes get_theme_topics(link = themes$links[4])
themes <- ennet_themes get_theme_topics(link = themes$links[4])
Get list of thematic areas from en-net.org
get_themes(base = "https://www.en-net.org")
get_themes(base = "https://www.en-net.org")
base |
Base URL of the en-net site. Set to https://www.en-net.org/ |
A tibble containing the thematic areas from en-net forum and the corresponding URLs for each thematic area
get_themes()
get_themes()
Get topics from multiple themes
get_themes_topics(themes = get_themes())
get_themes_topics(themes = get_themes())
themes |
A tibble containing thematic areas and URLs for thematic area pages |
A tibble of all topics across all thematic areas with their respective URLs
themes <- ennet_themes get_themes_topics(themes = themes[4, ])
themes <- ennet_themes get_themes_topics(themes = themes[4, ])
Get the discussion and other details for a particular topic
get_topic_discussions(link)
get_topic_discussions(link)
link |
URL for topic discussion |
A tibble containing the topic question
links <- get_theme_topics(link = (ennet_themes$links)[4]) get_topic_discussions(link = links$Link[1])
links <- get_theme_topics(link = (ennet_themes$links)[4]) get_topic_discussions(link = links$Link[1])
Get the discussion and details of discussion for a set of topics
get_topics_discussions(links)
get_topics_discussions(links)
links |
A tibble of topics containing URL of topic discussion. This
is provided using a call to |
A tibble containing the topic discussions for selected topic/s
links <- get_theme_topics(link = (ennet_themes$links)[4]) get_topics_discussions(link = links[1:3, ])
links <- get_theme_topics(link = (ennet_themes$links)[4]) get_topics_discussions(link = links[1:3, ])
Update en-net topics
update_topics(freq = 0)
update_topics(freq = 0)
freq |
A numeric value for time in seconds for frequency of retrieval of updated en-net topics. Defaults to 0 for a single retrieval (no repeats) |
A tibble containing the topics across all thematic areas in en-net forum
if(interactive()) update_topics()
if(interactive()) update_topics()