download_genesets_goatrepo {goat}R Documentation

Download and parse geneset collections from the GOAT GitHub repository

Description

while the Bioconductor respository is extensive, contains data for many species and is a part of a larger infrastructure, it might contain outdated GO data when the user is not using the latest R version. If users are on an R version that is a few years old, so will the GO data from Bioconductor be.

As an alternative, we store gene2go data from NCBI (for Human genes only!) at the GOAT GitHub repository. This function allows for a convenient way to download this data and then parse the genesets.

Alternatively you can browse the file in the data branch of the GOAT GitHub repository and download these files manually, then load them via the GOAT R function load_genesets_go_fromfile().

To view all available data you can open this URL in a browser; https://github.com/ftwkoopmans/goat/tree/data

New data is automatically added biannually. The first available version is 2024-01-01, the next 2024-06-01, then 2025-01-01, and so on.

Usage

download_genesets_goatrepo(
  output_dir,
  type = "GO",
  version = "2024-01-01",
  ignore_cache = FALSE
)

Arguments

output_dir

full path to the directory where the downloaded files should be stored. Directory is created if it does not exist. e.g. output_dir="~/data" on unix systems, output_dir="C:/data" on Windows, or set to output_dir=getwd() to write output to the current working directory

type

the type of genesets to download. Currently, only "GO" is supported (default)

version

the dataset version. This must be a date in format YYYY-MM-DD. Example: "2024-01-01" (default). View all available versions at https://github.com/ftwkoopmans/goat/tree/data

ignore_cache

boolean, set to TRUE to force re-download and ignore cached data, if any. Default: FALSE

Value

result from respective geneset parser function. e.g. if parameter type was set to"GO" (default), this function returns the result of load_genesets_go_fromfile(). These data returned by this function is typically used as input for filter_genesets(), c.f. full example at documentation for test_genesets()

Examples


# note: this example will download 2 files of approx 10MB in total

# store the downloaded files in the following directory. Here, the temporary file
# directory is used. Alternatively, consider storing this data in a more permanent location.
# e.g. output_dir="~/data/go" on unix systems or output_dir="C:/data/go" on Windows
output_dir = tempdir()

# download data files with GO annotations, version 2024-01-01 (default parameter)
# these are then parsed with the load_genesets_go_fromfile() function
# if the files are already available at output_dir, the download step is skipped
genesets_asis = download_genesets_goatrepo(output_dir)

### for a basic example on how to use the data obtain here,
### refer to the example included at function documentation of: test_genesets()


[Package goat version 1.0 Index]