Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suprising behaviour of if_any() inside case_when #5782

Closed
iagogv3 opened this issue Feb 23, 2021 · 7 comments · Fixed by #5793
Closed

Suprising behaviour of if_any() inside case_when #5782

iagogv3 opened this issue Feb 23, 2021 · 7 comments · Fixed by #5793
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@iagogv3
Copy link

iagogv3 commented Feb 23, 2021

I am not sure if I am doing something wrong, but If that is the case I cannot see what in next reproducible example:

data <- structure(list(x = c(1, 1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 
-1, -1, -1, -1, 1, -1, 1, -1, -1, -1, 1, -1, 1, -1, -1, -1, -1, 
-1, -1), y = c(-1, -1, 1, 1, 2, 1, -1, 1, 1, 1, -1, 1, 1, 1, 
1, -1, 1, -1, 1, 1, 2, -1, 1, -1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

data %>%
    mutate(dich = case_when(if_any(c(x, y), function(.x) {.x == 1}) ~ "green",
                             x==2 | y==2 ~ "blue",
                             TRUE ~ NA_character_)) %>%
    pull(dich) %>% table(useNA = "ifany")
blue green 
    2    28 

# but

data %>%
    mutate(dich = case_when(if_any(c(x, y), function(.x) {.x == 1}) ~ "green",
                             if_any(c(x, y), function(.x) {.x == 2}) ~ "blue",
                             TRUE ~ NA_character_)) %>%
    pull(dich) %>% table(useNA = "ifany")
green  <NA> 
   28     2 

Thank you!

@hadley hadley added the reprex needs a minimal reproducible example label Mar 3, 2021
@hadley
Copy link
Member

hadley commented Mar 3, 2021

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

@iagogv3
Copy link
Author

iagogv3 commented Mar 3, 2021

@hadley Here a reprex:

library(dplyr)
iris %>% 
  head() %>%
  mutate(G.Width = case_when(if_any(ends_with("Width"), ~ . > 3.5) ~ "Big", 
                             if_any(ends_with("Width"), ~ . > 3.1) ~ "Medium",
                             TRUE ~ "Small"))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species G.Width
#> 1          5.1         3.5          1.4         0.2  setosa   Small
#> 2          4.9         3.0          1.4         0.2  setosa   Small
#> 3          4.7         3.2          1.3         0.2  setosa   Small
#> 4          4.6         3.1          1.5         0.2  setosa   Small
#> 5          5.0         3.6          1.4         0.2  setosa     Big
#> 6          5.4         3.9          1.7         0.4  setosa     Big

Created on 2021-03-03 by the reprex package (v1.0.0)

@hadley
Copy link
Member

hadley commented Mar 3, 2021

Can you please make a smaller reprex? Is it necessary to use 150 rows to illustrate the problem?

@iagogv3
Copy link
Author

iagogv3 commented Mar 3, 2021

@hadley Reprex updated

@hadley
Copy link
Member

hadley commented Mar 3, 2021

Simpler reprex:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  x = 1:3
)
df %>% mutate(
  y1 = case_when(
    if_any(everything(), ~ . >= 3) ~ "big",
    if_any(everything(), ~ . >= 2) ~ "medium",
    TRUE ~ "small"
  ),
  y2 =  case_when(
    x >= 3 ~ "big",
    x >= 2 ~ "medium",
    TRUE ~ "small"
  )
)
#>   x    y1     y2
#> 1 1 small  small
#> 2 2 small medium
#> 3 3   big    big

Created on 2021-03-03 by the reprex package (v1.0.0)

@hadley hadley changed the title Only the first of multiple if_any inside case_when works Suprising behaviour of if_any() inside case_when Mar 3, 2021
@hadley
Copy link
Member

hadley commented Mar 3, 2021

Even simpler reprex:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  x = 1:3
)

df %>% mutate(
  y3 = if_any(x, ~ . >= 3),
  y2 = if_any(x, ~ . >= 2),
  
  data.frame(
    z3 = if_any(x, ~ . >= 3),
    z2 = if_any(x, ~ . >= 2)
  )
) %>% 
  select(y2, z2)
#>      y2    z2
#> 1 FALSE FALSE
#> 2  TRUE FALSE
#> 3  TRUE  TRUE

Created on 2021-03-03 by the reprex package (v1.0.0)

@romainfrancois
Copy link
Member

It has to do with the across() caching mechanism that is in place so that the tidy selection is not evaluated each time.

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  x = 1:3
)

trace(across, tracer = quote({ print(key); print(setup$fns) }), at = 4)
#> Tracing function "across" in package "dplyr"
#> [1] "across"
df %>% mutate(
  y1 = case_when(
    if_any(everything(), ~ . >= 3) ~ "big",
    if_any(everything(), ~ . >= 2) ~ "medium",
    TRUE ~ "small"
  )
)
#> Tracing across({ .... step 4 
#> [1] "across({"
#> $`1`
#> <lambda>
#> function (..., .x = ..1, .y = ..2, . = ..1) 
#> . >= 3
#> <environment: 0x7fbbf89bc2e8>
#> attr(,"class")
#> [1] "rlang_lambda_function" "function"             
#> 
#> Tracing across({ .... step 4 
#> [1] "across({"
#> $`1`
#> <lambda>
#> function (..., .x = ..1, .y = ..2, . = ..1) 
#> . >= 3
#> <environment: 0x7fbbf89bc2e8>
#> attr(,"class")
#> [1] "rlang_lambda_function" "function"
#>   x    y1
#> 1 1 small
#> 2 2 small
#> 3 3   big

Created on 2021-03-04 by the reprex package (v0.3.0)

if_any() is currently implemented as:

if_any <- function(.cols = everything(), .fns = NULL, ..., .names = NULL) {
  df <- across({{ .cols }}, .fns = .fns, ..., .names = .names)
  n <- nrow(df)
  df <- vec_cast_common(!!!df, .to = logical())
  .Call(dplyr_reduce_lgl_or, df, n)
}

So the call to across() is always across({{ .cols }}, .fns = .fns, ..., .names = .names) which gives "across({" as the key.

I'll fix if_any() and if_all() right now, however this probably needs some more thinking around the caching because e.g.

library(dplyr, warn.conflicts = FALSE)
times <- function(cols, multiplier = 1) {
  across({{ cols }}, function(x) {
    print(multiplier)
    x * multiplier
  })
}
data.frame(x = 1) %>% 
  summarise(
    times(x, 1)$x + times(x, 2)$x
  )
#> [1] 1
#> [1] 1                                                # <-------- should be 2
#>   times(x, 1)$x + times(x, 2)$x
#> 1                             2

Created on 2021-03-04 by the reprex package (v0.3.0)

@romainfrancois romainfrancois added bug an unexpected problem or unintended behavior and removed reprex needs a minimal reproducible example labels Mar 4, 2021
@romainfrancois romainfrancois added this to the 1.0.6 milestone Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants