faq-external-vector {tidyselect} | R Documentation |
FAQ - Note: Using an external vector in selections is ambiguous
Description
Ambiguity between columns and external variables
With selecting functions like dplyr::select()
or
tidyr::pivot_longer()
, you can refer to variables by name:
mtcars %>% select(cyl, am, vs) #> # A tibble: 32 x 3 #> cyl am vs #> <dbl> <dbl> <dbl> #> 1 6 1 0 #> 2 6 1 0 #> 3 4 1 1 #> 4 6 0 1 #> # i 28 more rows mtcars %>% select(mpg:disp) #> # A tibble: 32 x 3 #> mpg cyl disp #> <dbl> <dbl> <dbl> #> 1 21 6 160 #> 2 21 6 160 #> 3 22.8 4 108 #> 4 21.4 6 258 #> # i 28 more rows
For historical reasons, it is also possible to refer an external vector of variable names. You get the correct result, but with a warning informing you that selecting with an external variable is ambiguous because it is not clear whether you want a data frame column or an external object.
vars <- c("cyl", "am", "vs") result <- mtcars %>% select(vars) #> Warning: Using an external vector in selections was deprecated in tidyselect #> 1.1.0. #> i Please use `all_of()` or `any_of()` instead. #> # Was: #> data %>% select(vars) #> #> # Now: #> data %>% select(all_of(vars)) #> #> See #> <https://tidyselect.r-lib.org/reference/faq-external-vector.html>. #> This warning is displayed once every 8 hours. #> Call `lifecycle::last_lifecycle_warnings()` to see where this #> warning was generated.
We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable.
some_df <- mtcars[1:4, ] some_df$vars <- 1:nrow(some_df)
These are very different objects but it isn’t a problem if the context
forces you to be specific about where to find vars
:
vars #> [1] "cyl" "am" "vs" some_df$vars #> [1] 1 2 3 4
In a selection context however, the column wins:
some_df %>% select(vars) #> # A tibble: 4 x 1 #> vars #> <int> #> 1 1 #> 2 2 #> 3 3 #> 4 4
Fixing the ambiguity
To make your selection code more robust and silence the message, use
all_of()
to force the external vector:
some_df %>% select(all_of(vars)) #> # A tibble: 4 x 3 #> cyl am vs #> <dbl> <dbl> <dbl> #> 1 6 1 0 #> 2 6 1 0 #> 3 4 1 1 #> 4 6 0 1
For more information or if you have comments about this, please see the Github issue tracking the deprecation process.