mkSEER {SEERaBomb}R Documentation

Make R binaries of SEER data.

Description

Converts SEER ASCII text files into large R binaries that include all cancer types and registries combined.

Usage

mkSEER(df,seerHome="~/data/SEER",outDir="mrgd",outFile="cancDef",
                  indices = list(c("sex","race"), c("histo3","seqnum"),  "ICD9"),
                  writePops=TRUE,writeRData=TRUE,writeDB=FALSE)

Arguments

df

A data frame that was the output of pickFields(). This determines which fields to transfer. Using the output of getFields() is a common mistake that must be avoided.

seerHome

The directory that contains the SEER ‘population’ and ‘incidence’ directories. This should be writable by the user.

outDir

seerHome subdirectory to write to. Default is ‘mrgd’ for all registries merged together.

outFile

Base name of the SQLite database and cancer binary. Default = cancDef (Cancer Default).

indices

Passed to copy_to() in dplyr.

writePops

TRUE if you wish to write out the population data frame binaries. Doing so takes ~10 seconds, so savings of FALSE are small.

writeRData

TRUE if you wish to write out the cancer data frame binary. Writing files takes most of the time.

writeDB

TRUE if you wish to write cancer, popga, popsa, and popsae data frames to SQLite database tables.

Details

This function uses the R package LaF to access the fixed-width format data files of SEER. LaF is fast, but it requires knowledge of all the widths of columns wanted, as well as the the widths of unwanted stretches in between. This knowledge is produced by getFields() and pickFields() combined. It is passed to mkSEER() via the argument df.

Value

None, it produces R binary files of the SEER data.

Note

This takes a substantial amount of RAM (it works on a Mac with 16 GB of RAM) and time (~3 minutes using default fields).

Author(s)

Tom Radivoyevitch (radivot@ccf.org)

See Also

SEERaBomb-package,getFields,pickFields

Examples

## Not run: 
library(SEERaBomb)
(df=getFields())
(df=pickFields(df))
# the following will take a several minutes, but may only need 
# to be done roughly once per year, with each release.
mkSEER(df)

## End(Not run)

[Package SEERaBomb version 2019.2 Index]