estimateDataTemporalMap {EHRtemporalVariability}R Documentation

Estimates DataTemporalMap objects from raw data

Description

Estimates a DataTemporalMap from a data.frame containing individuals in rows and the variables in columns, being one of these columns the analysis date (typically the acquisition date). Will return a DataTemporalMap object or a list of DataTemporalMap objects depending on the number of analysis variables.

Usage

estimateDataTemporalMap(
  data = NULL,
  dateColumnName = NULL,
  period = "month",
  startDate = NULL,
  endDate = NULL,
  supports = NULL,
  numericVariablesBins = 100,
  numericSmoothing = TRUE,
  dateGapsSmoothing = FALSE,
  verbose = FALSE
)

Arguments

data

a data.frame containing as many rows as individuals, and as many columns as the analysis variables plus the individual acquisition date.

dateColumnName

a string indicating the name of the column in data containing the analysis date variable.

period

the period at which to batch data for the analysis from "week", "month" and "year", with "month" as default.

startDate

a Date object indicating the date at which to start the analysis, in case of being different from the first chronological date in the date column (the default).

endDate

a Date object indicating the date at which to end the analysis, in case of being different from the last chronological date in the date column (the default).

supports

a List of objects containing the support of the data distributions for each variable, in classes numeric, integer, character, or factor (accordingly to the variable type), and where the name of the list element must correspond to the column name of its variable. If not provided it is automatically estimated from data.

numericVariablesBins

the number of bins at which to define the frequency/density histogram for numerical variables when their support is not provided, 100 as default.

numericSmoothing

a logical value indicating whether a Kernel Density Estimation smoothing (Gaussian kernel, default bandwidth) is to be applied on numerical variables (the default) or a traditional histogram instead. See ?density for further details.

dateGapsSmoothing

a logical value indicating whether a linear smoothing is applied to those time batches without data, by default gaps are filled with NAs.

verbose

By default FALSE. Change it to TRUE to get an on-time log from the function.

Value

A DataTemporalMap object.

Examples

#Load the file 
dataset <- read.csv2(system.file("extdata",
                                   "nhdsSubset.csv",
                                   package="EHRtemporalVariability"), 
                     sep  = ",",
                     header = TRUE, 
                     na.strings = "", 
                     colClasses = c( "character", "numeric", "factor",
                                     "numeric" , rep( "factor", 22 ) ) )
#Format the date
datasetFormatted <- EHRtemporalVariability::formatDate( input         = dataset,
                                            dateColumn    = "date",
                                            dateFormat    = "%y/%m")

#Apply the estimateDataTemporalMap
probMaps <- estimateDataTemporalMap( data           = datasetFormatted, 
                                     dateColumnName = "date", 
                                     period         = "month")
## Not run: 

For a larger example download the following .csv dataset and continue the steps as above:

gitHubUrl  <- 'http://github.com/'
gitHubPath <- 'hms-dbmi/EHRtemporalVariability-DataExamples/'
gitHubFile <- 'raw/master/nhdsSubset.csv'
inputFile  <-  paste0(gitHubUrl, gitHubPath, gitHubFile)

dataset <- read.csv2( inputFile, 
                     sep  = ",",
                     header = TRUE, 
                     na.strings = "", 
                     colClasses = c( "character", "numeric", "factor",
                                     "numeric" , rep( "factor", 22 ) ) ) 

## End(Not run)

[Package EHRtemporalVariability version 1.2.1 Index]