opi_sim {opitools}R Documentation

Simulates the opinion expectation distribution of a digital text document.

Description

This function simulates the expectation distribution of the observed opinion score (computed using the opi_score function). The resulting tidy-format dataframe can be described as the ⁠expected sentiment document (ESD)⁠ (Adepeju and Jimoh, 2021).

Usage

opi_sim(osd_data, nsim=99, metric = 1, fun = NULL, quiet=TRUE)

Arguments

osd_data

A list (dataframe). An n x 3 OSD, in which n represents the length of the text records that have been successfully classified as expressing positive, negative or a neutral sentiment. Column 1 of the OSD is the text record ID, column 2 shows the sentiment classes (i.e. positive, negative, or neutral), while column 3 contains two variables: present and absent indicating records that include and records that do not include any of the specified theme keywords, respectively.

nsim

(an integer) Number of replicas (ESD) to simulate. Recommended values are: 99, 999, 9999, and so on. Since the run time is proportional to the number of replicas, a moderate number of simulation, such as 999, is recommended. Default: 99.

metric

(an integer) Specify the metric to utilize for the calculation of the opinion score. Default: 1. See details in the documentation of opi_score function. The input argument here must correspond to that of opi_score function in order to compute a statistical significance value (p-value).

fun

A user-defined function given that parameter metric is set equal to 5. See details in the documentation of the opi_score function.

quiet

(TRUE or FALSE) To suppress processing messages. Default: TRUE.

Details

Employs non-parametric randomization testing approach in order to generate the expectation distribution of the observed opinion scores (see details in Adepeju and Jimoh 2021).

Value

Returns a list of expected opinion scores with length equal to the number of simulation (nsim) specified.

References

(1) Adepeju, M. and Jimoh, F. (2021). An Analytical Framework for Measuring Inequality in the Public Opinions on Policing – Assessing the impacts of COVID-19 Pandemic using Twitter Data. https://doi.org/10.31235/osf.io/c32qh

Examples


#Prepare an osd data from the output
#of `opi_score` function.

score <- opi_score(textdoc = policing_dtd,
                     metric = 1, fun = NULL)
#extract OSD
OSD <- score$OSD
#note that `OSD` is shorter in length
#than `policing_dtd`, meaning that some
#text records were not classified

#Bind a fictitious indicator column
osd_data2 <- data.frame(cbind(OSD,
           keywords = sample(c("present","absent"), nrow(OSD),
           replace=TRUE, c(0.35, 0.65))))

#generate expected distribution
exp_score <- opi_sim(osd_data2, nsim=99, metric = 1,
                                 fun = NULL, quiet=TRUE)
#preview the distribution
hist(exp_score)


[Package opitools version 1.8.0 Index]