R: Missing Value Imputation for Accelerometer Data

accel.impute {accelmissing}

R Documentation

Missing Value Imputation for Accelerometer Data

Description

This function imputes the missing count values generated by the accelerometer. The imputation is performed during the user-defined daytime (9am-9pm as a default). At each minute, the function runs the multiple imputation with chained equations under the assumption of the zero-inflated poisson log-normal distribution.

Usage

accel.impute(PA, label, flag, demo=NA, method = "zipln", time.range = c("09:00","20:59"), 
K = 3, D = 5, mark.missing = 0, thresh = 10000, graph.diagnostic = TRUE, 
seed = 1234, m = 5, maxit = 6)

Arguments

`PA`	an N by T matrix including activity counts, where N is the total number of daily profiles, and T is the total minutes of a day (T=1440).
`label`	an N by 2 matrix including the labels corresponding to `PA` matrix. The first column, `label[,1]`, includes the person id, and the second column, `label[,2]`, includes the day label of 1 to 7, indicating Sunday to Saturday.
`flag`	an N by T matrix with the values of either 1 or 0 which indicating wearing or missing. This matrix can be created from `create.flag()`.
`demo`	an n by p dataframe where n is the total number of subject. The first column must include the unique person id, which equals to `unique(label[,1])`. From the second column to p-th column, one may include the demographic variables of intrest, for example, age, sex, body mass index, and race. These variables will be used as covariates in the imputation model. Missing values in demo matrix leads to an error message. The default is demo=NA.
`method`	Either "zipln" or "zipln.pmm." The former conducts the parametric imputation assumming the zero-inflated Poisson Log-normal (zipln) distribution. The latter conducts the semiparametric impuation with the predictive mean matching (pmm) under the zipln assumption.
`time.range`	Define the time range for imputation. Default is 9am-9pm, coded by `time.range = c("09:00", "20:59")`. Missing values outside of this range is imputed by zero assuming the extended sleep or inactivity.
`K`	The number of the lag and lead variables. `K=3` is default.
`D`	The number of donors when `method="zipln.pmm"`. `D=5` is default.
`mark.missing`	If `mark.missing = 0`(default), the nonwearing time is marked by 0 while the wearing time is marked by 1 in flag matrix. If `mark.missing = 1`, it is the opposite.
`thresh`	The upper bound of count values. `thresh=10000` is default.
`graph.diagnostic`	If `TRUE`, the scatter plot with the observed vs. the imputed will be shown during the imputation process.
`seed`	A seed number for random process. `seed=1234` is default.
`m`	The number of imputation datasets. `m=5` is default.
`maxit`	The number of maximum iteration at a fixed time point. `maxit=6` is default.

Value

listimp

List with m datasets with imputations.
The dimension of each dataset, dim(listimp[[1]]), is the same as dim(PA).

Note

seed, m, maxit are the input arguments in mice function.

Author(s)

Jung Ae Lee <jungaeleeb@gmail.com>

References

[1] Lee JA, Gill J (2016). Missing value imputation for physical activity data measured by accelerometer. Statistical Methods in Medical Research.
[2] van Buuren S, Groothuis-Oudshoorn K (2011). mice: Multivariate imputations by chained equations in R. Journal of Statistical Software.
[3] Jackman S (2014). pscl: Classes and Methods for R Developed in the Political Science Computational Laboratory. Stanford University. R package version 1.4.6.

Examples

####################################################
# A full example from data filtering to imputation	
####################################################
data(acceldata)    # read data
ls(acceldata)      # This is a list with four matrix objects, PA, label, flag, and demo
d = acceldata

## missing rate
missing.rate(label=d$label, flag=d$flag)$total  # 32 percent

# create missing flag with 60 min criterion
flag60 = create.flag(PA=d$PA, window=60)

## missing rate with flag60
mr = missing.rate(label=d$label, flag=flag60)
mr$total  #28.1 percent

## missing proportion by days
mean(mr$table < 0.1)   # 45.8 percent

# wearing proportion over time 
wear.time.plot(PA=d$PA, label=d$label, flag=flag60)

# data filtering for valid days
valid.days.out = valid.days(PA=d$PA, label=d$label, flag=flag60, wear.hr=8)
ls(valid.days.out)   # list with three matrix objects

# data filtering for valid subjects
x1 = list(PA=d$PA, label=d$label, flag=flag60) # original
x2 = valid.days.out   # output of valid.days()
valid.sub.out = valid.subjects(data1=x1, data2=x2, valid.days=3)
length(unique(valid.sub.out$label[,1]))   # 184 persons 
ls(valid.sub.out)

## missing rate with the filtered data
missing.rate(valid.sub.out$label, valid.sub.out$flag)$total    
# 20.1 percent 

# demographic data for the filtered data
idv= unique(valid.sub.out$label[,1])
matchid = match(idv, d$demo[,1]) 
demo1 = d$demo[matchid, ]

# save the data before imputation
acceldata2 = list(PA=valid.sub.out$PA, label=valid.sub.out$label, flag=valid.sub.out$flag, 
demo=demo1)
save(acceldata2, file="acceldata2.RData")

################################
# prepare the imputation
library(mice); library(pscl)
data(acceldata2) # load prepared data in this package, or 
# load("acceldata2.RData") # to use the data you saved in previous step.
data = acceldata2

# imputation: test only 10 minutes with semiparametic method  
# accelimp = accel.impute(PA=data$PA, label=data$label, flag=data$flag, 
# demo=data$demo, time.range=c("10:51","11:00"), method="zipln.pmm", D=5) 

# imputation: test only 10 minutes with parametic method  
# accelimp = accel.impute(PA=data$PA, label=data$label, flag=data$flag, 
# demo=data$demo, time.range=c("10:51","11:00"), method="zipln")

# plot 7 days before imputation 
accel.plot.7days(PA=data$PA[1:7, ], label=data$label[1:7, ], flag=data$flag[1:7, ],
 time.range=c("09:00", "20:59"), save.plot=FALSE)

# plot 7 days after imputation
data(accelimp) # load prepared data in this package, or use the data you created above.
accel.plot.7days(PA=accelimp[[1]][1:7, ], label=data$label[1:7, ], flag=data$flag[1:7, ], 
time.range=c("09:00", "20:59"),  save.plot=FALSE)

[Package accelmissing version 1.4 Index]