mkSEER {SEERaBomb} | R Documentation |
Make R binaries of SEER data.
Description
Converts SEER ASCII text files into large R binaries that include all cancer types and registries combined.
Usage
mkSEER(df,seerHome="~/data/SEER",outDir="mrgd",outFile="cancDef",
indices = list(c("sex","race"), c("histo3","seqnum"), "ICD9"),
writePops=TRUE,writeRData=TRUE,writeDB=FALSE)
Arguments
df |
A data frame that was the output of |
seerHome |
The directory that contains the SEER ‘population’ and ‘incidence’ directories. This should be writable by the user. |
outDir |
seerHome subdirectory to write to. Default is ‘mrgd’ for all registries merged together. |
outFile |
Base name of the SQLite database and cancer binary. Default = cancDef (Cancer Default). |
indices |
Passed to |
writePops |
TRUE if you wish to write out the population data frame binaries. Doing so takes ~10 seconds, so savings of FALSE are small. |
writeRData |
TRUE if you wish to write out the cancer data frame binary. Writing files takes most of the time. |
writeDB |
TRUE if you wish to write cancer, popga, popsa, and popsae data frames to SQLite database tables. |
Details
This function uses the R package LaF to access the fixed-width format data files
of SEER. LaF is fast, but it requires knowledge of all the widths of columns wanted, as well as the the widths of unwanted stretches in between. This knowledge is produced by getFields()
and pickFields()
combined. It is passed to mkSEER()
via the argument df
.
Value
None, it produces R binary files of the SEER data.
Note
This takes a substantial amount of RAM (it works on a Mac with 16 GB of RAM) and time (~3 minutes using default fields).
Author(s)
Tom Radivoyevitch (radivot@ccf.org)
See Also
SEERaBomb-package,getFields,pickFields
Examples
## Not run:
library(SEERaBomb)
(df=getFields())
(df=pickFields(df))
# the following will take a several minutes, but may only need
# to be done roughly once per year, with each release.
mkSEER(df)
## End(Not run)