R: load_many_tweets

load_many_tweets_json {CooRTweet}

R Documentation

load_many_tweets_json

Description

EXPERIMENTAL. Batched version of load_tweets_json with control over retained columns. Not as efficient as load_tweets_json but requires less memory. Wrapper of the function fload

Usage

load_many_tweets_json(
  data_dir,
  batch_size = 1000,
  keep_cols = c("text", "possibly_sensitive", "public_metrics", "lang",
    "edit_history_tweet_ids", "attachments", "geo"),
  query = NULL,
  query_error_ok = TRUE
)

Arguments

`data_dir`	string that leads to the directory containing JSON files
`batch_size`	integer specifying the number of JSON files to load per batch. Default: `1000`
`keep_cols`	character vector with the names of columns you want to keep. Set it to `NULL` to only retain the required columns. Default: keep_cols = c("text", "possibly_sensitive", "public_metrics", "lang", "edit_history_tweet_ids", "attachments", "geo")
`query`	(string) JSON Pointer query passed on to fload (optional). Default: `NULL`
`query_error_ok`	(Boolean) stop if `query` causes an error. Passed on to fload (optional). Default: `FALSE`

Details

Unlike load_tweets_json this function loads JSON files in batches and processes each batch before loading the next batch. You can specify which columns to keep, which in turn requires less memory. For example, you can decide not to keep the ⁠"text⁠ column, which requires quite a lot of memory.

Value

a data.table with all tweets loaded

[Package CooRTweet version 2.0.2 Index]