simu_data {OTrecod} | R Documentation |
A simulated dataset to test the functions of the OTrecod package
Description
The first 300 rows belong to the database A, while the next 400 rows belong to the database B.
Five covariates: Gender
, Treatment
, Dosage
, Smoking
and Age
are
common to both databases (same encodings). Gender
is the only complete covariate.
The variables Yb1
and Yb2
are the target variables of A and B respectively, summarizing a same information encoded in two different scales.
that summarize a same information saved in two distinct encodings, that is why, Yb1
is
missing in the database B and Yb2
is missing in the database A.
Usage
simu_data
Format
A data.frame made of 2 overlayed databases (A and B) with 700 observations on the following 8 variables.
- DB
the database identifier, a character with 2 possible classes:
A
orB
- Yb1
the target variable of the database A, stored as factor and encoded in 3 ordered levels:
[20-40]
,[40-60[
,[60-80]
(the values related to the database B are missing)- Yb2
the target variable of the database B, stored as integer (an unknown scale from 1 to 5) in the database B (the values related to A are missing)
- Gender
a factor with 2 levels (
Female
orMale
) and no missing values- Treatment
a covariate of 3 classes stored as a character with 2% of missing values:
Placebo
,Trt A
,Trt B
- Dosage
a factor with 4 levels and 5% of missing values: from
Dos 1
todos 4
- Smoking
a covariate of 2 classes stored as a character and 10% of missing values:
NO
for non smoker,YES
otherwise- Age
a numeric corresponding to the age of participants in years. This variable counts 5% of missing values
Details
The purpose of the functions contained in this package is to predict the missing information on Yb1
and Yb2
in database A and database B using the Optimal Transportation Theory.
Missing information has been simulated to some covariates following a simple MCAR process.
Source
randomly generated