exclude_duplicates {excluder} | R Documentation |
Exclude rows with duplicate IP addresses and/or locations
Description
The exclude_duplicates()
function removes
rows of data that have the same IP address and/or same latitude and
longitude. The function is written to work with data from
Qualtrics surveys.
Usage
exclude_duplicates(
x,
id_col = "ResponseId",
ip_col = "IPAddress",
location_col = c("LocationLatitude", "LocationLongitude"),
rename = TRUE,
dupl_ip = TRUE,
dupl_location = TRUE,
include_na = FALSE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
dupl_ip |
Logical indicating whether to check IP addresses. |
dupl_location |
Logical indicating whether to check latitude and longitude. |
include_na |
Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, IP address and location are both checked, but they can be
checked separately with the dupl_ip
and dupl_location
arguments.
The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.
Value
An object of the same type as x
that excludes rows
with duplicate IP addresses and/or locations.
For a function that just checks for and returns duplicate rows,
use check_duplicates()
. For a function that marks these rows,
use mark_duplicates()
.
See Also
Other duplicates functions:
check_duplicates()
,
mark_duplicates()
Other exclude functions:
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
Examples
# Exclude duplicate IP addresses and locations
data(qualtrics_text)
df <- exclude_duplicates(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_duplicates()
# Exclude only for duplicate locations
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_duplicates(dupl_location = FALSE)