-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
join_by(): Syntax for generic joins #2240
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Would this cover wildcard joins as well? |
Need to keep in mind that dbplyr now translates full joins with |
This comment has been minimized.
This comment has been minimized.
We also need to review what fuzzyjoin already offers. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Me too comments are not very useful for us; please just 👍 the issue. |
A thought on the NSE library(rlang)
#> Warning: package 'rlang' was built under R version 4.0.2
capture_equal_equal <- function(e1, e2) {
list(op = "==", lhs = enexpr(e1), rhs = enexpr(e2))
}
capture_greater <- function(e1, e2) {
list(op = ">", lhs = enexpr(e1), rhs = enexpr(e2))
}
capture_greater_equal <- function(e1, e2) {
list(op = ">=", lhs = enexpr(e1), rhs = enexpr(e2))
}
capture_less <- function(e1, e2) {
list(op = "<", lhs = enexpr(e1), rhs = enexpr(e2))
}
capture_less_equal <- function(e1, e2) {
list(op = "<=", lhs = enexpr(e1), rhs = enexpr(e2))
}
bindings <- list(
`==` = capture_equal_equal,
`>` = capture_greater,
`>=` = capture_greater_equal,
`<` = capture_less,
`<=` = capture_less_equal
)
env_join_by <- new_environment(bindings)
eval_join_by <- function(quo) {
expr <- quo_get_expr(quo)
# Bare symbol of `x` translates to `x == x`
if (is_symbol(expr)) {
expr <- expr(`==`(!!expr, !!expr))
}
# Evaluate expression in limited env
out <- eval_bare(expr, env_join_by)
# Retain quosure env for later evaluation
env <- list(env = quo_get_env(quo))
out <- c(out, env)
out
}
join_by <- function(...) {
quos <- enquos(...)
if (any(names(quos) != "")) {
abort("`=` used in `join_by()`, did you mean `==`?")
}
lapply(quos, eval_join_by)
}
process_join_by_op <- function(op, data_lhs, data_rhs) {
# Evaluate with lhs/rhs data mask
list(
op = op$op,
lhs = eval_tidy(op$lhs, data = data_lhs, env = op$env),
rhs = eval_tidy(op$rhs, data = data_rhs, env = op$env)
)
}
process_join_by_ops <- function(ops, data_lhs, data_rhs) {
lapply(ops, process_join_by_op, data_lhs = data_lhs, data_rhs = data_rhs)
}
lhs <- data.frame(x = "foo", yearmonth = 2)
rhs <- data.frame(x = c("bar", "baz"), gmonth = 5:6)
vec <- 5
ops <- join_by(x, yearmonth - vec >= gmonth + 3)
str(ops)
#> List of 2
#> $ :List of 4
#> ..$ op : chr "=="
#> ..$ lhs: symbol x
#> ..$ rhs: symbol x
#> ..$ env:<environment: R_GlobalEnv>
#> $ :List of 4
#> ..$ op : chr ">="
#> ..$ lhs: language yearmonth - vec
#> ..$ rhs: language gmonth + 3
#> ..$ env:<environment: R_GlobalEnv>
ops2 <- process_join_by_ops(ops, lhs, rhs)
str(ops2)
#> List of 2
#> $ :List of 3
#> ..$ op : chr "=="
#> ..$ lhs: chr "foo"
#> ..$ rhs: chr [1:2] "bar" "baz"
#> $ :List of 3
#> ..$ op : chr ">="
#> ..$ lhs: num -3
#> ..$ rhs: num [1:2] 8 9 |
I've implemented a rough POC for rolling joins here https://github.com/DavisVaughan/slidejoin, which uses slider as the backend to generate the sliding indices that you join on. Many examples are in the readme. It may serve as inspiration for the rolling backend here, but is otherwise untested and not fit for production use.
Sliding left/right/inner/full joins are implemented by first performing the corresponding mutating join only on the |
#557 (comment) and #378 (comment) propose a syntax for generic and rolling joins:
As usual, this should be powered by an SE version
join_by_()
.We can pass this to the SQL engine (and perhaps to data tables) with relatively little work, the main challenge will be to implement this for data frames.
The text was updated successfully, but these errors were encountered: