clean_CRAN_db {cranly}R Documentation

Clean and organize package and author names in the output of tools::CRAN_package_db()

Description

Clean and organize package and author names in the output of tools::CRAN_package_db()

Usage

clean_CRAN_db(
  packages_db,
  clean_directives = clean_up_directives,
  clean_author = clean_up_author,
  clean_maintainer = standardize_whitespace
)

Arguments

packages_db

a data.frame with the same structure to the output of tools::CRAN_package_db() (default) or utils::available.packages().

clean_directives

a function that transforms the contents of the various directives in the package descriptions to vectors of package names. Default is clean_up_directives().

clean_author

a function that transforms the contents of Author to vectors of package authors. Default is clean_up_author().

clean_maintainer

a function that transforms the contents of Maintainer to vectors of of maintainer names. Default is standardize_whitespace().

Details

clean_CRAN_db() uses clean_up_directives() and clean_up_author() to clean up the author names and package names in the various directives (like Imports, Depends, Suggests, Enhances, LinkingTo) as in the data.frame that results from tools::CRAN_package_db() return an organized data.frame of class cranly_db that can be used for further analysis.

The function tries hard to identify and eliminate mistakes in the Author field of the description file, and extract a clean list of only author names. The relevant operations are coded in the clean_up_author() function. Specifically, some references to copyright holders had to go because they were contaminating the list of authors (most are not necessary anyway, but that is a different story...). The current version of clean_up_author() is far from best practice in using regex but it currently does a fair job in cleaning up messy Author fields. It will be improving in future versions.

Custom clean-up functions can also be supplied via the clean_directives and clean_author arguments.

Value

A data.frame with the same variables as package_db (but with lower case names), that also inherits from class_db, and has a timestamp attribute.

Examples


## Download today's CRAN package database
cran_db <- tools::CRAN_package_db()

## Before clean up
cran_db[cran_db$Package == "weights", "Author"]

## After clean up
package_db <- clean_CRAN_db(cran_db)
package_db[package_db$package == "weights", "author"]


[Package cranly version 0.6.0 Index]