LinkNormalization {Rcrawler}R Documentation

Link Normalization

Description

To normalize and transform URLs into a canonical form.

Usage

LinkNormalization(links, current)

Arguments

links

character, one or more URLs to Normalize.

current

character, The current page URL where links are located

Value

Vector of normalized urls

Author(s)

salim khalil

Examples


# Normalize a set of links

links<-c("http://www.twitter.com/share?url=http://glofile.com/page.html",
         "/finance/banks/page-2017.html",
         "./section/subscription.php",
         "//section/",
         "www.glofile.com/home/",
         "IndexEn.aspx",
         "glofile.com/sport/foot/page.html",
         "sub.glofile.com/index.php",
         "http://glofile.com/page.html#1",
         "?tags%5B%5D=votingrights&amp;sort=popular"
                   )

links<-LinkNormalization(links,"http://glofile.com" )

links



[Package Rcrawler version 0.1.9-1 Index]