df.subset {misty}R Documentation

Subsetting Data Frames

Description

This function returns subsets of data frames which meet conditions.

Usage

df.subset(..., data, subset = NULL, drop = TRUE, check = TRUE)

Arguments

...

an expression indicating variables to select from the data frame specified in data. See Details for the list of operators used in this function, i.e., ., +, -, ~, :, ::, and !.

data

a data frame that contains the variables specified in the argument .... Note that if data = NULL, only the variables specified in ... are returned.

subset

character string with a logical expression indicating rows to keep, e.g., "x == 1", "x1 == 1 & x2 == 3", or "gender == 'female'". By default, all rows of the data frame specified in data are kept. Note that logical queries for rows resulting in missing values are not select.

drop

logical: if TRUE (default), data frame with a single column is converted into a vector.

check

logical: if TRUE (default), argument specification is checked.

Details

The argument ... is used to specify an epxression indicating the variables to select from the data frame specified in data, e.g., df.subset(x1, x2, x3, data = dat). There are seven operators which can be used in the expression ...:

Dot (.) Operator

The dot operator is used to select all variables from the data frame specified in data. For example, df.subset(., data = dat) selects all variables in dat. Note that this operator is similar to the function everything() from the tidyselect package.

Plus (+) Operator

The plus operator is used to select variables matching a prefix from the data frame specified in data. For example, df.subset(+x, data = dat) selects all variables with the prefix x. Note that this operator is equivalent to the function starts_with() from the tidyselect package.

Minus (-) Operator

The minus operator is used to select variables matching a suffix from the data frame specified in data. For example, df.subset(-y, data = dat) selects all variables with the suffix y. Note that this operator is equivalent to the function ends_with() from the tidyselect package.

Tilde (~) Operator

The tilde operator is used to select variables containg a word from the data frame specified in data. For example, df.subset(?al, data = dat) selects all variables with the word al. Note that this operator is equivalent to the function contains() from the tidyselect package.

Colon (:) operator

The colon operator is used to select a range of consecutive variables from the data frame specified in data. For example, df.subset(x:z, data = dat) selects all variables from x to z. Note that this operator is equivalent to the : operator from the select function in the dplyr package.

Double Colon (::) Operator

The double colon operator is used to select numbered variables from the data frame specified in data. For example, df.subset(x1::x3, data = dat) selects the variables x1, x2, and x3. Note that this operator is similar to the function num_range() from the tidyselect package.

Exclamation Point (!) Operator

The exclamation point operator is used to drop variables from the data frame specified in data or for taking the complement of a set of variables. For example, df.subset(., !x, data = dat) selects all variables but x in dat., df.subset(., !~x, data = dat) selects all variables but variables with the prefix x, or df.subset(x:z, !x1:x3, data = dat) selects all variables from x to z but excludes all variables from x1 to x3. Note that this operator is equivalent to the ! operator from the select function in the dplyr package.

Note that operators can be combined within the same function call. For example, df.subset(+x, -y, !x2:x4, z, data = dat) selects all variables with the prefix x and with the suffix y but excludes variables from x2 to x4 and select variable z.

Value

Returns a data frame containing the variables and rows selected in the argument ... and rows selected in the argument subset.

Author(s)

Takuya Yanagida takuya.yanagida@univie.ac.at

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

df.duplicated, df.merge, df.move, df.rbind, df.rename, df.sort

Examples

## Not run: 
#-------------------------------------------------------------------------------
# Select single variables

# Example 1: Select 'Sepal.Length' and 'Petal.Width'
df.subset(Sepal.Length, Petal.Width, data = iris)

#-------------------------------------------------------------------------------
# Select all variables using the . operator

# Example 2a: Select all variables, select rows with 'Species' equal 'setosa'
# Note that single quotation marks ('') are needed to specify 'setosa'
df.subset(., data = iris, subset = "Species == 'setosa'")

# Example 2b: Select all variables, select rows with 'Petal.Length' smaller 1.2
df.subset(., data = iris, subset = "Petal.Length < 1.2")

#-------------------------------------------------------------------------------
# Select variables matching a prefix using the + operator

# Example 3: Select variables with prefix 'Petal'
df.subset(+Petal, data = iris)

#-------------------------------------------------------------------------------
# Select variables matching a suffix using the - operator

# Example 4: Select variables with suffix 'Width'
df.subset(-Width, data = iris)

#-------------------------------------------------------------------------------
# Select variables containing a word using the ~ operator
# Example 5: Select variables containing 'al'
df.subset(~al, data = iris)

#-------------------------------------------------------------------------------
# Select consecutive variables using the : operator

# Example 6: Select all variables from 'Sepal.Width' to 'Petal.Width'
df.subset(Sepal.Width:Petal.Width, data = iris)

#-------------------------------------------------------------------------------
# Select numbered variables using the :: operator

# Example 7: Select all variables from 'x1' to 'x3' and 'y1' to 'y3'
df.subset(x1::x3, y1::y3, data = anscombe)

#-------------------------------------------------------------------------------
# Drop variables using the ! operator

# Example 8a: Select all variables but 'Sepal.Width'
df.subset(., !Sepal.Width, data = iris)

# Example 8b: Select all variables but 'Sepal.Width' to 'Petal.Width'
df.subset(., !Sepal.Width:Petal.Width, data = iris)

#----------------------------------------------------------------------------
# Combine +, - , !, and : operators

# Example 9: Select variables with prefix 'x' and suffix '3', but exclude
# variables from 'x2' to 'x3'
df.subset(+x, -3, !x2:x3, data = anscombe)

## End(Not run)

[Package misty version 0.6.3 Index]