load_many_tweets_json {CooRTweet}R Documentation

load_many_tweets_json

Description

EXPERIMENTAL. Batched version of load_tweets_json with control over retained columns. Not as efficient as load_tweets_json but requires less memory. Wrapper of the function fload

Usage

load_many_tweets_json(
  data_dir,
  batch_size = 1000,
  keep_cols = c("text", "possibly_sensitive", "public_metrics", "lang",
    "edit_history_tweet_ids", "attachments", "geo"),
  query = NULL,
  query_error_ok = TRUE
)

Arguments

data_dir

string that leads to the directory containing JSON files

batch_size

integer specifying the number of JSON files to load per batch. Default: 1000

keep_cols

character vector with the names of columns you want to keep. Set it to NULL to only retain the required columns. Default: keep_cols = c("text", "possibly_sensitive", "public_metrics", "lang", "edit_history_tweet_ids", "attachments", "geo")

query

(string) JSON Pointer query passed on to fload (optional). Default: NULL

query_error_ok

(Boolean) stop if query causes an error. Passed on to fload (optional). Default: FALSE

Details

Unlike load_tweets_json this function loads JSON files in batches and processes each batch before loading the next batch. You can specify which columns to keep, which in turn requires less memory. For example, you can decide not to keep the ⁠"text⁠ column, which requires quite a lot of memory.

Value

a data.table with all tweets loaded


[Package CooRTweet version 2.0.2 Index]