PhecapData {PheCAP} | R Documentation |
Define or Read Datasets for Phenotyping
Description
Specify the data to be used for phenotyping.
Usage
PhecapData(
data, hu_feature, label, validation,
patient_id = NULL, subject_weight = NULL,
seed = 12300L, feature_transformation = log1p)
Arguments
data |
A data.frame consisting of all the variables needed for phenotyping, or a character scalar of the path to the data, or a list consisting of either character scalar or data.frame. If a list is given, patient_id cannot be NULL. All the datasets in the list will be joined into a single dataset according to the columns specified by patient_id. |
hu_feature |
A character scalar or vector specifying the names of one of more healthcare utilization (HU) variables. There variables are always included in the phenotyping model. |
label |
A character scalar of the column name that gives the phenotype status (1 or TRUE: present, 0 or FALSE: absent). If label is not ready yet, just put a column filled with NA in data. In such cases only the feature extraction step can be done. |
validation |
A character scalar, a real number strictly between 0 and 1, or an integer not less than 2. If a character scalar is used, it is treated as the column name in the data that specifies whether this observation belongs to the validation samples (1 or TRUE: validation, 0 or FALSE: training). If a real number strictly between 0 and 1 is used, it is treated as the proportion of the validation samples. The actual validation samples will be drawn from all labeled samples. If an integer not less than 2 is used, it is treated as the size of the validation samples. The actual validation samples will be drawn from all labeled samples. |
patient_id |
A character vector for the column names, if any, that uniquely identifies each patient. Such variables must appear in the data. patient_id can be NULL if such fields are not contained in the data. |
subject_weight |
An optional numeric vector of weights for observations. |
seed |
If validation samples need to be drawn from all labeled samples, seed specifies the random seed for sampling. |
feature_transformation |
A function that will be applied to all the features.
Since count data are typically right-skewed,
by default |
Value
An object of class PhecapData
.
See Also
See PheCAP-package
for code examples.