detect_groups {CooRTweet} | R Documentation |
detect_groups
Description
Function to perform the initial stage in detecting coordinated behavior. It identifies pairs of accounts that share the same objects in a time_window. See details.
Usage
detect_groups(
x,
time_window = 10,
min_participation = 2,
remove_loops = TRUE,
...
)
Arguments
x |
a data.table with the columns: |
time_window |
the number of seconds within which shared contents are to be considered as coordinated (default to 10 seconds). |
min_participation |
The minimum number of actions required for a account
to be included in subsequent analysis (default set at 2). This ensures that
only accounts with a minimum level of activity in the original dataset are
included in subsequent analysis. It is important to distinguish this from the
frequency of repeated interactions an account has with another specific account,
as represented by edge weight. The edge weight parameter is utilized in the
|
remove_loops |
Should loops (shares of the same objects made by the same account within the time window) be removed? (default to TRUE). |
... |
keyword arguments for backwards compatibility. |
Details
This function achieves the initial stage in detecting coordinated
behavior by identifying accounts who share identical objects within the same
temporal window, and is preliminary to the network analysis conducted using
the generate_coordinated_network function.
detect_groups
groups the data by object_id
(uniquely identifies
content) and calculates the time differences between all
content_id
(ids of account generated contents) within their groups.
It then filters out all content_id
that are higher than the time_window
(in seconds). It returns a data.table
with all IDs of coordinated
contents. The object_id
can be for example: hashtags, IDs of tweets being
retweeted, or URLs being shared. For twitter data, best use reshape_tweets.
Value
a data.table with ids of coordinated contents. Columns:
object_id
, account_id
, account_id_y
, content_id
, content_id_y
,
timedelta
. The account_id
and content_id
represent the "older"
data points, account_id_y
and content_id_y
represent the "newer"
data points. For example, account A retweets from account B, then account A's
content is newer (i.e., account_id_y
).