LinkNormalization {Rcrawler} | R Documentation |
Link Normalization
Description
To normalize and transform URLs into a canonical form.
Usage
LinkNormalization(links, current)
Arguments
links |
character, one or more URLs to Normalize. |
current |
character, The current page URL where links are located |
Value
Vector of normalized urls
Author(s)
salim khalil
Examples
# Normalize a set of links
links<-c("http://www.twitter.com/share?url=http://glofile.com/page.html",
"/finance/banks/page-2017.html",
"./section/subscription.php",
"//section/",
"www.glofile.com/home/",
"IndexEn.aspx",
"glofile.com/sport/foot/page.html",
"sub.glofile.com/index.php",
"http://glofile.com/page.html#1",
"?tags%5B%5D=votingrights&sort=popular"
)
links<-LinkNormalization(links,"http://glofile.com" )
links
[Package Rcrawler version 0.1.9-1 Index]