removeDups {neonOS}R Documentation

Remove duplicates from a data table based on a provided primary key; flag duplicates that can't be removed.

Description

NEON observational data may contain duplicates; this function removes exact duplicates, attempts to resolve non-exact duplicates, and flags duplicates that can't be resolved.

Usage

removeDups(data, variables, table = NA_character_, ncores = 1)

Arguments

data

A data frame containing data from a NEON observational data table [data frame]

variables

The NEON variables file containing metadata about the data table in question [data frame]

table

The name of the table. Must match one of the table names in 'variables' [character]

ncores

The maximum number of cores to use for parallel processing. Defaults to 1. [numeric]

Details

Duplicates are identified based on exact matches in the values of the primary key. For records with identical keys, these steps are followed, in order: (1) If records are identical except for NA or empty string values, the non-empty values are kept. (2) If records are identical except for uid, remarks, and/or personnel (xxxxBy) fields, unique values are concatenated within each field, and the merged version is kept. (3) For records that are identical following steps 1 and 2, one record is kept and flagged with duplicateRecordQF=1. (4) Records that can't be resolved by steps 1-3 are flagged with duplicateRecordQF=2. Note that in a set of three or more duplicates, some records may be resolveable and some may not; if two or more records are left after steps 1-3, all remaining records are flagged with duplicateRecordQF=2. In some limited cases, duplicates can't be unambiguously identified, and these records are flagged with duplicateRecordQF=-1.

Value

A modified data frame with resolveable duplicates removed and a flag field added and populated.

Author(s)

Claire Lunch clunch@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Examples

# Resolve and flag duplicates in a test dataset of foliar lignin
lig_dup <- removeDups(data=cfc_lignin_test_dups, 
                      variables=cfc_lignin_variables,
                      table="cfc_lignin")

[Package neonOS version 1.1.0 Index]