| simu_data {OTrecod} | R Documentation |
A simulated dataset to test the functions of the OTrecod package
Description
The first 300 rows belong to the database A, while the next 400 rows belong to the database B.
Five covariates: Gender, Treatment, Dosage, Smoking and Age are
common to both databases (same encodings). Gender is the only complete covariate.
The variables Yb1 and Yb2 are the target variables of A and B respectively, summarizing a same information encoded in two different scales.
that summarize a same information saved in two distinct encodings, that is why, Yb1 is
missing in the database B and Yb2 is missing in the database A.
Usage
simu_data
Format
A data.frame made of 2 overlayed databases (A and B) with 700 observations on the following 8 variables.
- DB
the database identifier, a character with 2 possible classes:
AorB- Yb1
the target variable of the database A, stored as factor and encoded in 3 ordered levels:
[20-40],[40-60[,[60-80](the values related to the database B are missing)- Yb2
the target variable of the database B, stored as integer (an unknown scale from 1 to 5) in the database B (the values related to A are missing)
- Gender
a factor with 2 levels (
FemaleorMale) and no missing values- Treatment
a covariate of 3 classes stored as a character with 2% of missing values:
Placebo,Trt A,Trt B- Dosage
a factor with 4 levels and 5% of missing values: from
Dos 1todos 4- Smoking
a covariate of 2 classes stored as a character and 10% of missing values:
NOfor non smoker,YESotherwise- Age
a numeric corresponding to the age of participants in years. This variable counts 5% of missing values
Details
The purpose of the functions contained in this package is to predict the missing information on Yb1 and Yb2
in database A and database B using the Optimal Transportation Theory.
Missing information has been simulated to some covariates following a simple MCAR process.
Source
randomly generated