-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a ptype
argument to between()
since we can't make it cast to the type of x
at this time
#6906
Comments
I think this is by design. Look at the documentation:
But if you want the behavior, it could be tedious, see the following: library(dplyr)
#>
#> 载入程辑包:'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x <- data.frame(
Visit = factor(
c("Discharge", "Month 3", "Month 6", "Month 12", "Month 20"),
levels = c("Baseline", "Discharge", "Month 3", "Month 6", "Month 12", "Month 20", "Premature"),
ordered = TRUE
)
)
x |>
filter(
between(
Visit,
factor("Discharge", levels = c("Baseline", "Discharge", "Month 3", "Month 6", "Month 12", "Month 20", "Premature"), ordered = TRUE),
factor("Month 20", levels = c("Baseline", "Discharge", "Month 3", "Month 6", "Month 12", "Month 20", "Premature"), ordered = TRUE)
)
)
#> Visit
#> 1 Discharge
#> 2 Month 3
#> 3 Month 6
#> 4 Month 12
#> 5 Month 20 Created on 2023-08-10 with reprex v2.0.2 The logic behind this is just type casting used by library(dplyr)
#>
#> 载入程辑包:'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x <- data.frame(
Visit = factor(
c("Discharge", "Month 3", "Month 6", "Month 12", "Month 20"),
levels = c("Baseline", "Discharge", "Month 3", "Month 6", "Month 12", "Month 20", "Premature"),
ordered = TRUE
)
)
x |>
filter(data.table::between(Visit, "Discharge", "Month 20"))
#> Visit
#> 1 Discharge
#> 2 Month 3
#> 3 Month 6
#> 4 Month 12
#> 5 Month 20 Created on 2023-08-10 with reprex v2.0.2 |
Thank you very much for the quick explanation, @psychelzh! But, at the end of the day, although I know the source of this behaviour, this is even more frustrating. Because when someone knows how the BETWEEN operator works in, say, SQL, where it's equivalent to the >= A AND <= B condition and expects such equivalence elsewhere (because why not?), the result I showed is confusing incredibly. If only a warning or error, like "unequal types of both arguments/operands/variables/columns" were printed (the same way the binding functions or joins already do), it could save a lot of time wasted on investigating weird results (provided someone even noticed the result is incomplete! This case was trivial, but I can imagine a definitely more complex one with multiple conditions). Please, @hadley, consider at least adding a warning, if between() cannot be adjusted for cases like this. In many (if not most) places tidyverse is very strict, to prevent confusing results, yet here, where I wouldn't expect any problem, it's far too liberal, giving unexpected results. Without a warning or error, the explanation in the manual may not suffice - I guess some of people will be surprised even after reading it, because when we read about "common types" we usually think: "OK, number & number, character & character, factor & factor", not necessarily "factor_#0493" and "factor_#0493". And in case like that, the behaviour is really surprising at first sight. |
I do agree with this! In fact, I think {vctrs} sometimes really gets too strict or restricted. More frustration also comes from tidyverse/tidyr#1483. I think loose behavior like base R, such as casting number to character, is sometimes absolutely acceptable. |
This ultimately comes down to the fact that In vctrs, the common type of factor and character is character (the more general type), so all inputs end up getting cast to character before the comparison. I tried making You could make your own wrapper around library(dplyr, warn.conflicts = FALSE)
x <- tibble(Visit = c("Discharge", "Month 3", "Month 6", "Month 12", "Month 20"))
x$Visit <- factor(x$Visit, levels = c("Baseline", "Discharge", "Month 3", "Month 6", "Month 12", "Month 20", "Premature"), ordered = TRUE)
x
#> # A tibble: 5 × 1
#> Visit
#> <ord>
#> 1 Discharge
#> 2 Month 3
#> 3 Month 6
#> 4 Month 12
#> 5 Month 20
between_cast_to_x <- function(x, left, right) {
args <- vctrs::vec_cast_common(left = left, right = right, .to = x)
left <- args$left
right <- args$right
between(x, left, right)
}
between(x$Visit, "Discharge", "Month 12")
#> [1] TRUE FALSE FALSE TRUE FALSE
between_cast_to_x(x$Visit, "Discharge", "Month 12")
#> [1] TRUE TRUE TRUE TRUE FALSE I think the only thing we can do is consider exposing a |
ptype
argument to between()
since we can't make it cast to the type of x
at this time
IMHO Since 1.1.x dplyr is becoming too strict and there are too much warnings, but I have the surprise than some users want that. The solution for the users is to verify and its codes and datas with many scenarios. The result of this goes to lack of flexibility and blackbox for the data transformation. I use lot of dplyr since 0.7 and again now, but since 1.1 when some functions are becoming too strict I bypass it back to base R or I make invisible some of warnings. |
The result of using
between()
in filtering is not equivalent to evaluating a conditional expression when working with ordered factors.Now let's make it an ordered factor with assumed levels (used later for other operations):
When filtering using the conjunction of conditions it works
When using the convenient between(), it doesn't
2 items are missing in the output.
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
The text was updated successfully, but these errors were encountered: