accel.impute {accelmissing} | R Documentation |
Missing Value Imputation for Accelerometer Data
Description
This function imputes the missing count values generated by the accelerometer. The imputation is performed during the user-defined daytime (9am-9pm as a default). At each minute, the function runs the multiple imputation with chained equations under the assumption of the zero-inflated poisson log-normal distribution.
Usage
accel.impute(PA, label, flag, demo=NA, method = "zipln", time.range = c("09:00","20:59"),
K = 3, D = 5, mark.missing = 0, thresh = 10000, graph.diagnostic = TRUE,
seed = 1234, m = 5, maxit = 6)
Arguments
PA |
an N by T matrix including activity counts, where N is the total number of daily profiles, and T is the total minutes of a day (T=1440). |
label |
an N by 2 matrix including the labels corresponding to |
flag |
an N by T matrix with the values of either 1 or 0 which indicating wearing or missing. This matrix can be created from |
demo |
an n by p dataframe where n is the total number of subject. The first column must include the unique person id, which equals to |
method |
Either "zipln" or "zipln.pmm." The former conducts the parametric imputation assumming the zero-inflated Poisson Log-normal (zipln) distribution. The latter conducts the semiparametric impuation with the predictive mean matching (pmm) under the zipln assumption. |
time.range |
Define the time range for imputation. Default is 9am-9pm, coded by |
K |
The number of the lag and lead variables. |
D |
The number of donors when |
mark.missing |
If |
thresh |
The upper bound of count values. |
graph.diagnostic |
If |
seed |
A seed number for random process. |
m |
The number of imputation datasets. |
maxit |
The number of maximum iteration at a fixed time point. |
Value
listimp |
List with |
Note
seed
, m
, maxit
are the input arguments in mice
function.
Author(s)
Jung Ae Lee <jungaeleeb@gmail.com>
References
[1] Lee JA, Gill J (2016). Missing value imputation for physical activity data measured by accelerometer. Statistical Methods in Medical Research.
[2] van Buuren S, Groothuis-Oudshoorn K (2011). mice: Multivariate imputations by chained equations in R. Journal of Statistical Software.
[3] Jackman S (2014). pscl: Classes and Methods for R Developed in the Political Science Computational Laboratory. Stanford University. R package version 1.4.6.
Examples
####################################################
# A full example from data filtering to imputation
####################################################
data(acceldata) # read data
ls(acceldata) # This is a list with four matrix objects, PA, label, flag, and demo
d = acceldata
## missing rate
missing.rate(label=d$label, flag=d$flag)$total # 32 percent
# create missing flag with 60 min criterion
flag60 = create.flag(PA=d$PA, window=60)
## missing rate with flag60
mr = missing.rate(label=d$label, flag=flag60)
mr$total #28.1 percent
## missing proportion by days
mean(mr$table < 0.1) # 45.8 percent
# wearing proportion over time
wear.time.plot(PA=d$PA, label=d$label, flag=flag60)
# data filtering for valid days
valid.days.out = valid.days(PA=d$PA, label=d$label, flag=flag60, wear.hr=8)
ls(valid.days.out) # list with three matrix objects
# data filtering for valid subjects
x1 = list(PA=d$PA, label=d$label, flag=flag60) # original
x2 = valid.days.out # output of valid.days()
valid.sub.out = valid.subjects(data1=x1, data2=x2, valid.days=3)
length(unique(valid.sub.out$label[,1])) # 184 persons
ls(valid.sub.out)
## missing rate with the filtered data
missing.rate(valid.sub.out$label, valid.sub.out$flag)$total
# 20.1 percent
# demographic data for the filtered data
idv= unique(valid.sub.out$label[,1])
matchid = match(idv, d$demo[,1])
demo1 = d$demo[matchid, ]
# save the data before imputation
acceldata2 = list(PA=valid.sub.out$PA, label=valid.sub.out$label, flag=valid.sub.out$flag,
demo=demo1)
save(acceldata2, file="acceldata2.RData")
################################
# prepare the imputation
library(mice); library(pscl)
data(acceldata2) # load prepared data in this package, or
# load("acceldata2.RData") # to use the data you saved in previous step.
data = acceldata2
# imputation: test only 10 minutes with semiparametic method
# accelimp = accel.impute(PA=data$PA, label=data$label, flag=data$flag,
# demo=data$demo, time.range=c("10:51","11:00"), method="zipln.pmm", D=5)
# imputation: test only 10 minutes with parametic method
# accelimp = accel.impute(PA=data$PA, label=data$label, flag=data$flag,
# demo=data$demo, time.range=c("10:51","11:00"), method="zipln")
# plot 7 days before imputation
accel.plot.7days(PA=data$PA[1:7, ], label=data$label[1:7, ], flag=data$flag[1:7, ],
time.range=c("09:00", "20:59"), save.plot=FALSE)
# plot 7 days after imputation
data(accelimp) # load prepared data in this package, or use the data you created above.
accel.plot.7days(PA=accelimp[[1]][1:7, ], label=data$label[1:7, ], flag=data$flag[1:7, ],
time.range=c("09:00", "20:59"), save.plot=FALSE)