import {mpathsenser} | R Documentation |
Import m-Path Sense files into a database
Description
Import JSON files from m-Path Sense into a structured database. This function is the bread and
butter of this package, as it populates the database with data that most of the other functions
in this package use. It is recommend to first run test_jsons()
and, if necessary,
fix_jsons()
to repair JSON files with problematic syntax.
Usage
import(
path = getwd(),
db,
sensors = NULL,
batch_size = 24,
backend = "RSQLite",
recursive = TRUE
)
Arguments
path |
The path to the file directory |
db |
Valid database connection, typically created by |
sensors |
Select one or multiple sensors as in |
batch_size |
The number of files that are to be processed in a single batch. |
backend |
Name of the database backend that is used. Currently, only RSQLite is supported. |
recursive |
Should the listing recurse into directories? |
Details
import
allows you to specify which sensors to import (even though there may be more in
the files) and it also allows batching for a speedier writing process. If processing in
parallel is active, it is recommended that batch_size
be a scalar multiple of the number of
CPU cores the parallel cluster can use. If a single JSON file in the batch causes and error,
the batch is terminated (but not the function) and it is up to the user to fix the file. This
means that if batch_size
is large, many files will not be processed. Set batch_size
to 1
for sequential (one-by-one) file processing.
Currently, only SQLite is supported as a backend. Due to its concurrency restriction, parallel
processing works for cleaning the raw data, but not for importing it into the database. This is
because SQLite does not allow multiple processes to write to the same database at the same
time. This is a limitation of SQLite and not of this package. However, while files are
processing individually (and in parallel if specified), writing to the database happens for the
entire batch specified by batch_size
at once. This means that if a single file in the batch
causes an error, the entire batch is skipped. This is to ensure that the database is not left
in an inconsistent state.
Value
A message indicating how many files were imported. If all files were imported successfully, this functions returns an empty string invisibly. Otherwise the file names of the files that were not imported are returned visibly.
Parallel
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
Progress
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
See Also
create_db()
for creating a database for import()
to use, close_db()
for closing
this database; index_db()
to create indices on the database for faster future processing, and
vacuum_db()
to shrink the database to its minimal size.
Examples
## Not run:
path <- "some/path"
# Create a database
db <- create_db(path = path, db_name = "my_db")
# Import all JSON files in the current directory
import(path = path, db = db)
# Import all JSON files in the current directory, but do so sequentially
import(path = path, db = db, batch_size = 1)
# Import all JSON files in the current directory, but only the accelerometer data
import(path = path, db = db, sensors = "accelerometer")
# Import all JSON files in the current directory, but only the accelerometer and gyroscope data
import(path = path, db = db, sensors = c("accelerometer", "gyroscope"))
# Remember to close the database
close_db(db)
## End(Not run)