R: Remove duplicates in data

rtry_remove_dup {rtry}

R Documentation

Remove duplicates in data

Description

This function removes the duplicates from the input data using the duplicate identifier OrigObsDataID provided within the TRY data. Once the function is called and executed, the number of duplicates removed will be displayed on the console as reference.

Usage

rtry_remove_dup(input, showOverview = TRUE)

Arguments

`input`	Input data frame or data table.
`showOverview`	Default `TRUE` displays the the dimension of the data after removing the duplicates.

Value

An object of the same type as the input data after removing the duplicates.

Note

This function depends on the duplicate identifier OrigObsDataID listed in the data exported from the TRY database, therefore, if the column OrigObsDataID has been removed, this function will not work. Also, if the original value of an indicated duplicate is a restricted value, which has not been requested from the TRY database (if only public data were requested), the duplicate will be removed and this may result in data loss.

References

This function makes use of the subset function within the base package.

Examples

# Remove the duplicates within the provided sample data (data_TRY_15160)
data_rm_dup <- rtry_remove_dup(data_TRY_15160)

# Expected message:
# 45 duplicates removed.
# dim:   1737 28

[Package rtry version 1.1.0 Index]