not_news {rdomains} | R Documentation |
Classify News and Non-News Based on keywords in the URL
Description
Based on a slightly amended version of the regular expression used to classify news, and non-news in: “Exposure to ideologically diverse news and opinion on Facebook” by Bakshy, Messing, and Adamic. Science. 2015.
Usage
not_news(url_list = NULL)
Arguments
url_list |
vector of URLs |
Details
Amendment: sport rather than sports
URL containing any of the following words is classified as soft news: "sport|entertainment|arts|fashion|style|lifestyle|leisure|celeb|movie|music|gossip|food|travel|horoscope|weather|gadget"
URL containing any of following words is classified as hard news: "politi|usnews|world|national|state|elect|vote|govern|campaign|war|polic|econ|unemploy|racis|energy|abortion|educa|healthcare|immigration"
Note that it is based on patterns existing in a small set of domains. See paper for details.
Value
data.frame with 3 columns: url, not_news, news
References
https://www.science.org/doi/10.1126/science.aaa1160
Examples
## Not run:
not_news("http://www.bbc.com/sport")
not_news(c("http://www.bbc.com/sport", "http://www.washingtontimes.com/news/politics/"))
## End(Not run)