load_many_tweets_json {CooRTweet} | R Documentation |
load_many_tweets_json
Description
EXPERIMENTAL. Batched version of load_tweets_json with control over retained columns. Not as efficient as load_tweets_json but requires less memory. Wrapper of the function fload
Usage
load_many_tweets_json(
data_dir,
batch_size = 1000,
keep_cols = c("text", "possibly_sensitive", "public_metrics", "lang",
"edit_history_tweet_ids", "attachments", "geo"),
query = NULL,
query_error_ok = TRUE
)
Arguments
data_dir |
string that leads to the directory containing JSON files |
batch_size |
integer specifying the number of JSON files
to load per batch. Default: |
keep_cols |
character vector with the names of columns you want to
keep. Set it to |
query |
(string) JSON Pointer query passed on to
fload (optional). Default: |
query_error_ok |
(Boolean) stop if |
Details
Unlike load_tweets_json this function loads JSON files
in batches and processes each batch before loading the next batch.
You can specify which columns to keep, which in turn requires less memory.
For example, you can decide not to keep the "text
column, which
requires quite a lot of memory.
Value
a data.table with all tweets loaded