R: Shell function for analysing an accelerometer dataset.

GGIR {GGIR}

R Documentation

Shell function for analysing an accelerometer dataset.

Description

This function is designed to help users operate all steps of the analysis. It helps to generate and structure milestone data, and produces user-friendly reports. The function acts as a shell with calls to g.part1, g.part2, g.part3, g.part4 and g.part5.

Usage

GGIR(mode = 1:5,
     datadir = c(),
     outputdir = c(),
     studyname = c(),
     f0 = 1, f1 = 0,
     do.report = c(2, 4, 5, 6),
     configfile = c(),
     myfun = c(),
     verbose = TRUE, ...)

Arguments

`mode`	Numeric (default = 1:5). Specify which of the five parts need to be run, e.g., mode = 1 makes that g.part1 is run; or mode = 1:5 makes that the whole GGIR pipeline is run, from g.part1 to g.part5. Optionally mode can also include the number 6 to tell GGIR to run g.part6 which is currently under development.
`datadir`	Character (default = c()). Directory where the accelerometer files are stored, e.g., "C:/mydata", or list of accelerometer filenames and directories, e.g. c("C:/mydata/myfile1.bin", "C:/mydata/myfile2.bin").
`outputdir`	Character (default = c()). Directory where the output needs to be stored. Note that this function will attempt to create folders in this directory and uses those folder to keep output.
`studyname`	Character (default = c()). If the datadir is a folder, then the study will be given the name of the data directory. If datadir is a list of filenames then the studyname as specified by this input argument will be used as name for the study.
`f0`	Numeric (default = 1). File index to start with (default = 1). Index refers to the filenames sorted in alphabetical order.
`f1`	Numeric (default = 0). File index to finish with (defaults to number of files available).
`do.report`	Numeric (default = c(2, 4, 5, 6)). For which parts to generate a summary spreadsheet: 2, 4, 5, and/or 6. Default is c(2, 4, 5, 6). A report will be generated based on the available milestone data. When creating milestone data with multiple machines it is advisable to turn the report generation off when generating the milestone data, value = c(), and then to merge the milestone data and turn report generation back on while setting overwrite to FALSE.
`configfile`	Character (default = c()). Configuration file previously generated by function GGIR. See details.
`myfun`	List (default = c()). External function object to be applied to raw data. See package vignette for detailed tutorial with examples on how to use the function embedding: https://cran.r-project.org/package=GGIR/vignettes/ExternalFunction.html
`verbose`	Boolean (default = TRUE). to indicate whether console message should be printed. Note that warnings and error are always printed and can be suppressed with suppressWarning() or suppressMessages().
`...`	Any of the parameters used GGIR. Given the large number of parameters used in GGIR we have grouped them in objects that start with "params_". These are documented in the details section. You cannot provide these objects as argument to function GGIR, but you can provide the parameters inside them as input to function GGIR.

Details

Once you have used function GGIR and the output directory (outputdir) will be filled with milestone data and results. Function GGIR stores all the explicitely entered argument values and default values for the argument that are not explicitely provided in a csv-file named config.csv stored in the root of the output folder. The config.csv file is accepted as input to GGIR with argument configfile to replace the specification of all the arguments, except datadir and outputdir.

The practical value of this is that it eases the replication of analysis, because instead of having to share you R script, sharing your config.csv file will be sufficient. Further, the config.csv file contribute to the reproducibility of your data analysis.

Note: When combining a configuration file with explicitely provided argument values, the explicitely provided argument values will overrule the argument values in the configuration file. If a parameter is neither provided via the configuration file nor as input then GGIR uses its default paramter values which can be inspected with command print(load_params()), and if you are specifically interested in a certain subgroup of parameters, e.g., physical activity, then you can do print(load_params()$params_phyact). These defaults are part of the GGIR code and cannot be changed by the user.

The parameters that can be used in GGIR are:

params_general

A list of parameters used across all GGIR parts that do not fall in any of the other categories.

overwrite: Boolean (default = FALSE). Do you want to overwrite analysis for which milestone data exists? If overwrite = FALSE, then milestone data from a previous analysis will be used if available and visual reports will not be created again.
dayborder: Numeric (default = 0). Hour at which days start and end (dayborder = 4 would mean 4 am).
do.parallel: Boolean (default = TRUE). Whether to use multi-core processing (only works if at least 4 CPU cores are available).
maxNcores: Numeric (default = NULL). Maximum number of cores to use when argument do.parallel is set to true. GGIR by default uses either the maximum number of available cores or the number of files to process (whichever is lower), but this argument allows you to set a lower maximum.
acc.metric: Character (default = "ENMO"). Which one of the acceleration metrics do you want to use for all acceleration magnitude analyses in GGIR part 5 and the visual report? For example: "ENMO", "LFENMO", "MAD", "NeishabouriCount_y", or "NeishabouriCount_vm". Only one acceleration metric can be specified and the selected metric needs to have been calculated in part 1 (see g.part1) via arguments such as do.enmo = TRUE or do.mad = TRUE.
part5_agg2_60seconds: Boolean (default = FALSE). Whether to use aggregate epochs to 60 seconds as part of the GGIR g.part5 analysis. Aggregation is doen by averaging. Note that when working with count metrics such as Neishabouri counts this means that the threshold can stay the same as in part 2, because again the threshold is expressed relative to the original epoch size, even if averaged per minute. For example if we want to use a cut-point 100 count per minute then we specify mvpathreshold = 100 * (5/60) as well as 'threshold.mod = 100 * (5/60) regardless of whether we set part5_agg2_60seconds to TRUE or FALSE.
print.filename: Boolean (default = FALSE). Whether to print the filename before analysing it (in case do.parallel = FALSE). Printing the filename can be useful to investigate problems (e.g., to verify that which file is being read).
desiredtz: Character (default = "", i.e., system timezone). Timezone in which device was configured and experiments took place. If experiments took place in a different timezone, then use this argument for the timezone in which the experiments took place and argument configtz to specify where the device was configured. Use the "TZ identifier" as specified at https://en.wikipedia.org/wiki/Zone.tab to set desiredtz, e.g., "Europe/London".
configtz: Character (default = "", i.e., system timezone). At the moment only functional for GENEActiv .bin, AX3 cwa, ActiGraph .gt3x, and ad-hoc csv file format. Timezone in which the accelerometer was configured. Only use this argument if the timezone of configuration and timezone in which recording took place are different. Use the "TZ identifier" as specified at https://en.wikipedia.org/wiki/Zone.tab to set configtz, e.g., "Europe/London".
sensor.location: Character (default = "wrist"). To indicate sensor location, default is wrist. If it is hip, the HDCZA algorithm for sleep detection also requires longitudinal axis of sensor to be between -45 and +45 degrees.
windowsizes: Numeric vector, three values (default = c(5, 900, 3600)). To indicate the lengths of the windows as in c(window1, window2, window3): window1 is the short epoch length in seconds, by default 5, and this is the time window over which acceleration and angle metrics are calculated; window2 is the long epoch length in seconds for which non-wear and signal clipping are defined, default 900 (expected to be a multitude of 60 seconds); window3 is the window length of data used for non-wear detection and by default 3600 seconds. So, when window3 is larger than window2 we use overlapping windows, while if window2 equals window3 non-wear periods are assessed by non-overlapping windows.
idloc: Numeric (default = 1). If idloc = 1 the code assumes that ID number is stored in the obvious header field. Note that for ActiGraph data the ID is never stored in the file header. For value set to 2, 5, 6, and 7, GGIR looks at the filename and extracts the character string preceding the first occurance of a "_" (idloc = 2), " " (space, idloc = 5), "." (dot, idloc = 6), and "-" (idloc = 7), respectively. You may have noticed that idloc 3 and 4 are skipped, they were used for one study in 2012, and not actively maintained anymore, but because it is legacy code not omitted.
expand_tail_max_hours: Numeric (default = NULL). This parameter has been replaced by recordingEndSleepHour.
recordingEndSleepHour: Numeric (default = NULL). Time (in hours) at which the recording should end (or later) to expand the g.part1 output with synthetic data to trigger sleep detection for last night. Using argument recordingEndSleepHour implies the assumption that the participant fell asleep at or before the end of the recording if the recording ended at or after recordingEndSleepHour hour of the last day. This assumption may not always hold true and should be used with caution. The synthetic data for metashort entails: timestamps continuing regularly, zeros for acceleration metrics other than EN, one for EN. Angle columns are created in a way that it triggers the sleep detection using the equation: round(sin((1:length_expansion) / (900/epochsize))) * 15. To keep track of the tail expansion g.part1 stores the length of the expansion in the RData files, which is then passed via g.part2, g.part3, and g.part4 to g.part5. In g.part5 the tail expansion size is included as an additional variable in the csv-reports. In the g.part4 csv-report the last night is omitted, because we know that sleep estimates from the last night will not be trustworthy. Similarly, in the g.part5 output columns related to the sleep assessment will be omitted for the last window to avoid biasing the averages. Further, the synthetic data are also ignored in the visualizations and time series output to avoid biased output.
dataFormat: Character (default = "raw"). To indicate what the format is of the data in datadir. Alternatives: ukbiobank_csv, actiwatch_csv, actiwatch_awd, actigraph_csv, and sensewear_xls, which correspond to epoch level data files from, respecitively, UK Biobank in csv format, Actiwatch in csv format, Actiwatch in awd format, ActiGraph csv format, and Sensewear in xls format (also works with xlsx). Here, the assumed epoch size for UK Biobank csvdata is 5 seconds. The epoch size for the other non-raw data formats is flexible, but make sure that you set first value of argument windowsizes accordingly. Also when working with non-raw data formats specify argument extEpochData_timeformat as documented below. For ukbiobank_csv nonwear is a column in the data itself, for actiwatch_csv, actiwatch_awd, actigraph_csv, and sensewear_xls non-wear is detected as 60 minute rolling zeros. The length of this window can be modified with the third value of argument windowsizes expressed in seconds.
maxRecordingInterval: Numeric (default = NULL). To indicate the maximum gap in hours between repeated measurements with the same ID for the recordings to be appended. So, the assumption is that the ID can be matched, make sure argument idloc is set correctly. If argument maxRecordingInterval is set to NULL (default) recordings are not appended. If recordings overlap then GGIR will use the data from the latest recording. If recordings are separated then the timegap between the recordings is filled with data points that resemble monitor not worn. The maximum value of maxFile gap is 504 (21 days). Only recordings from the same accelerometer brand are appended. The part 2 csv report will show number of appended recordings, sampling rate for each, time overlap or gap and the names of the filenames of the respective recording.
extEpochData_timeformat: Character (default = "%d-%m-%Y %H:%M:%S"). To specify the time format used in the external epoch level data when argument dataFormat is set to "actiwatch_csv", "actiwatch_awd", "actigraph_csv" or "sensewear_xls". For example "%Y-%m-%d %I:%M:%S %p" for "2023-07-11 01:24:01 PM" or "%m/%d/%Y %H:%M:%S" "2023-07-11 13:24:01"

params_rawdata

A list of parameters used to related to reading and pre-processing raw data, excluding parameters related to metrics as those are in the params_metrics object.

backup.cal.coef: Character (default = "retrieve"). Option to use backed-up calibration coefficient instead of deriving the calibration coefficients when analysing the same file twice. Argument backup.cal.coef has two usecase. Use case 1: If the auto-calibration fails then the user has the option to provide back-up calibration coefficients via this argument. The value of the argument needs to be the name and directory of a csv-spreadsheet with the following column names and subsequent values: "filename" with the names of accelerometer files on which the calibration coefficients need to be applied in case auto-calibration fails; "scale.x", "scale.y", and "scale.z" with the scaling coefficients; "offset.x", "offset.y", and "offset.z" with the offset coefficients, and; "temperature.offset.x", "temperature.offset.y", and "temperature.offset.z" with the temperature offset coefficients. This can be useful for analysing short lasting laboratory experiments with insufficient sphere data to perform the auto-calibration, but for which calibration coefficients can be derived in an alternative way. It is the users responsibility to compile the csv-spreadsheet. Instead of building this file the user can also Use case 2: The user wants to avoid performing the auto-calibration repeatedly on the same file. If backup.cal.coef value is set to "retrieve" (default) then GGIR will look out for the "data_quality_report.csv" file in the outputfolder QC, which holds the previously generated calibration coefficients. If you do not want this happen, then deleted the data_quality_report.csv from the QC folder or set it to value "redo".
minimumFileSizeMB: Numeric (default = 2). Minimum File size in MB required to enter processing. This argument can help to avoid having short uninformative files to enter the analyses. Given that a typical accelerometer collects several MBs per hour, the default setting should only skip the very tiny files.
do.cal: Boolean (default = TRUE). Whether to apply auto-calibration or not by g.calibrate. Recommended setting is TRUE.
imputeTimegaps: Boolean (default = TRUE). To indicate whether timegaps larger than 1 sample should be imputed. Currently only used for .gt3x data and ActiGraph .csv format, where timegaps can be expected as a result of Actigraph's idle sleep.mode configuration.
spherecrit: Numeric (default = 0.3). The minimum required acceleration value (in g) on both sides of 0 g for each axis. Used to judge whether the sphere is sufficiently populated
minloadcrit: Numeric (default = 168). The minimum number of hours the code needs to read for the autocalibration procedure to be effective (only sensitive to multitudes of 12 hrs, other values will be ceiled). After loading these hours only extra data is loaded if calibration error has not been reduced to under 0.01 g.
printsummary: Boolean (default = FALSE). If TRUE will print a summary of the calibration procedure in the console when done.
chunksize: Numeric (default = 1). Value to specify the size of chunks to be loaded as a fraction of an approximately 12 hour period for auto-calibration procedure and as fraction of 24 hour period for the metric calculation, e.g., 0.5 equals 6 and 12 hour chunks, respectively. For machines with less than 4Gb of RAM memory or with < 2GB memory per process when using do.parallel = TRUE a value below 1 is recommended. The value is constrained by GGIR to not be lower than 0.05. Please note that setting 0.05 will not produce output when 3rd value of parameter windowsizes is 3600.
dynrange: Numeric (default = NULL). Provide dynamic range of 8 gravity.
interpolationType: Integer (default = 1). To indicate type of interpolation to be used when resampling time series (mainly relevant for Axivity sensors), 1=linear, 2=nearest neighbour.
rmc.file: Character (default = NULL). Filename of file to be read if it is in the working directory, or full path to the file otherwise.
rmc.nrow: Numeric (default = NULL). Number of rows to read, same as nrow argument in read.csv and nrows in fread. The whole file is read by default (i.e., rmc.nrow = Inf).
rmc.skip: Numeric (default = 0). Number of rows to skip, same as skip argument in read.csv and in fread.
rmc.dec: Character (default = "."). Decimal used for numbers, same as dec argument in read.csv and in fread.
rmc.firstrow.acc: Numeric (default = NULL). First row (number) of the acceleration data.
rmc.firstrow.header: Numeric (default = NULL). First row (number) of the header. Leave blank if the file does not have a header.
rmc.header.length: Numeric (default = NULL). If file has header, specify header length (number of rows).
rmc.col.acc: Numeric, three values (default = c(1, 2, 3)). Vector with three column (numbers) in which the acceleration signals are stored.
rmc.col.temp: Numeric (default = NULL). Scalar with column (number) in which the temperature is stored. Leave in default setting if no temperature is available. The temperature will be used by g.calibrate.
rmc.col.time: Numeric (default = NULL). Scalar with column (number) in which the timestamps are stored. Leave in default setting if timestamps are not stored.
rmc.unit.acc: Character (default = "g"). Character with unit of acceleration values: "g", "mg", or "bit".
rmc.unit.temp: Character (default = "C"). Character with unit of temperature values: (K)elvin, (C)elsius, or (F)ahrenheit.
rmc.unit.time: Character (default = "POSIX"). Character with unit of timestamps: "POSIX", "UNIXsec" (seconds since origin, see argument rmc.origin), "character", or "ActivPAL" (exotic timestamp format only used in the ActivPAL activity monitor).
rmc.format.time: Character (default = " Character giving a date-time format as used by strptime. Only used for rmc.unit.time: character and POSIX.
rmc.bitrate: Numeric (default = NULL). If unit of acceleration is a bit then provide bit rate, e.g., 12 bit.
rmc.dynamic_range: Numeric or character (default = NULL). If unit of acceleration is a bit then provide dynamic range deviation in g from zero, e.g., +/-6g would mean this argument needs to be 6. If you give this argument a character value the code will search the file header for elements with a name equal to the character value and use the corresponding numeric value next to it as dynamic range.
rmc.unsignedbit: Boolean (default = TRUE). If unsignedbit = TRUE means that bits are only positive numbers. if unsignedbit = FALSE then bits are both positive and negative.
rmc.origin: Character (default = "1970-01-01"). Origin of time when unit of time is UNIXsec, e.g., 1970-1-1.
rmc.desiredtz: Character (default = NULL). Timezone in which experiments took place. This argument is scheduled to be deprecated and is now used to overwrite desiredtz if not provided.
rmc.configtz: Character (default = NULL). Timezone in which device was configured. This argument is scheduled to be deprecated and is now used to overwrite configtz if not provided.
rmc.sf: Numeric (default = NULL). Sample rate in Hertz, if this is stored in the file header then that will be used instead (see argument rmc.headername.sf).
rmc.headername.sf: Character (default = NULL). If file has a header: Row name under which the sample frequency can be found.
rmc.headername.sn: Character (default = NULL). If file has a header: Row name under which the serial number can be found.
rmc.headername.recordingid: Character (default = NULL). If file has a header: Row name under which the recording ID can be found.
rmc.header.structure: Character (default = NULL). Used to split the header name from the header value, e.g., ":" or " ".
rmc.check4timegaps: Boolean (default = FALSE). To indicate whether gaps in time should be imputed with zeros. Some sensing equipment provides accelerometer with gaps in time. The rest of GGIR is not designed for this, by setting this argument to TRUE the gaps in time will be filled with zeros.
rmc.col.wear: Numeric (default = NULL). If external wear detection outcome is stored as part of the data then this can be used by GGIR. This argument specifies the column in which the wear detection (Boolean) is stored.
rmc.doresample: Boolean (default = FALSE). To indicate whether to resample the data based on the available timestamps and extracted sample rate from the file header.
rmc.noise: Numeric (default = 13). Noise level of acceleration signal in mg-units, used when working ad-hoc .csv data formats using read.myacc.csv. The read.myacc.csv does not take rmc.noise as argument, but when interacting with GGIR or g.part1 rmc.noise is used.
rmc.scalefactor.acc: Numeric value (default 1) to scale the acceleration signals via multiplication. For example, if data is provided in m/s2 then by setting this to 1/9.81 we would derive gravitational units.
frequency_tol: Number (default = 0.1) as passed on to readAxivity from the GGIRread package. Represents the frequency tolerance as fraction between 0 and 1. When the relative bias per data block is larger than this fraction then the data block will be imputed by lack of movement with gravitational oriationed guessed from most recent valid data block. Only applicable to Axivity .cwa data.

params_metrics

A list of parameters used to specify the signal metrics that need to be extract in GGIR g.part1.

do.anglex

Boolean (default = FALSE). If TRUE, calculates the angle of the X axis relative to the horizontal:

angleX = (\tan{^{-1}\frac{acc_{rollmedian(x)}}{(acc_{rollmedian(y)})^2 + (acc_{rollmedian(z)})^2}}) * 180/\pi

do.angley

Boolean (default = FALSE). If TRUE, calculates the angle of the Y axis relative to the horizontal:

angleY = (\tan{^{-1}\frac{acc_{rollmedian(y)}}{(acc_{rollmedian(x)})^2 + (acc_{rollmedian(z)})^2}}) * 180/\pi

do.anglez

Boolean (default = TRUE). If TRUE, calculates the angle of the Z axis relative to the horizontal:

angleZ = (\tan{^{-1}\frac{acc_{rollmedian(z)}}{(acc_{rollmedian(x)})^2 + (acc_{rollmedian(y)})^2}}) * 180/\pi

do.zcx

Boolean (default = FALSE). If TRUE, calculates metric zero-crossing count for x-axis. For computation specifics see source code of function g.applymetrics

do.zcy

Boolean (default = FALSE). If TRUE, calculates metric zero-crossing count for y-axis. For computation specifics see source code of function g.applymetrics

do.zcz

Boolean (default = FALSE). If TRUE, calculates metric zero-crossing count for z-axis. For computation specifics see source code of function g.applymetrics

do.enmo

Boolean (default = TRUE). If TRUE, calculates the metric:

ENMO = \sqrt{acc_x^2 + acc_y^2 + acc_z^2} - 1

(if ENMO < 0, then ENMO = 0).

do.lfenmo

Boolean (default = FALSE). If TRUE, calculates the metric ENMO over the low-pass filtered accelerations (for computation specifics see source code of function g.applymetrics). The filter bound is defined by the parameter hb.

do.en

Boolean (default = FALSE). If TRUE, calculates the Euclidean Norm of the raw accelerations:

EN = \sqrt{acc_x^2 + acc_y^2 + acc_z^2}

do.mad

Boolean (default = FALSE). If TRUE, calculates the Mean Amplitude Deviation:

MAD = \frac{1}{n}\Sigma|r_i - \overline{r}|

do.enmoa

Boolean (default = FALSE). If TRUE, calculates the metric:

ENMOa = \sqrt{acc_x^2 + acc_y^2 + acc_z^2} - 1

(if ENMOa < 0, then ENMOa = |ENMOa|).

do.roll_med_acc_x