`rowwise()` size zero data frame edge case #6303

DavisVaughan · 2022-06-19T14:49:01Z

There is a weird edge case that occurs with size zero data frames and rowwise(). In these cases, a list-col is made up of just list(), meaning there aren't any elements in there to extract and use in the expression.

Practically, what ends up happening is that this Rf_length(column) > 0 branch returns FALSE, so the list-column gets chopped, resulting in list(list())

dplyr/src/chop.cpp

Lines 20 to 24 in 55dfc1c

    
           if (rowwise && vctrs::vec_is_list(column) && Rf_length(column) > 0) { 
        
             SET_PRCODE(prom, column); 
        
           } else { 
        
             SET_PRCODE(prom, Rf_lang3(dplyr::functions::vec_chop, column, rows)); 
        
           }

That results in weird bugs, like this one:

library(dplyr)

df <- tibble(x = list(1, 2, 3, 4))
df <- rowwise(df)

# Looks good
mutate(df, y = x + 1)
#> # A tibble: 4 × 2
#> # Rowwise: 
#>   x             y
#>   <list>    <dbl>
#> 1 <dbl [1]>     2
#> 2 <dbl [1]>     3
#> 3 <dbl [1]>     4
#> 4 <dbl [1]>     5

# Now it is impossible to know type is in `x`
df <- df[0,]
df
#> # A tibble: 0 × 1
#> # Rowwise: 
#> # … with 1 variable: x <list>
df$x
#> list()

# In practice this ends up doing `x + 1` -> `list() + 1`, which is theoretically
# very different from the `x + 1` above
mutate(df, y = x + 1)
#> Error in `mutate()`:
#> ! Problem while computing `y = x + 1`.
#> Caused by error in `x + 1`:
#> ! non-numeric argument to binary operator

I think that Rf_length(column) > 0 was added to help with this case, but I think it may have made things more confusing and less consistent, because you aren't accessing the "elements" of the list column, you are working on the whole list-col itself

The text was updated successfully, but these errors were encountered:

DavisVaughan · 2022-06-19T15:20:48Z

I have a suspicion that this should result in chops of list(NULL) rather than list(list()) in the case of an empty bare list-col. With a classed vctrs-list-of, it could use list(<ptype>), which is the theoretically correct option. This is similar to what we do in a few places in tidyr - NULL is our fallback "best option" if we don't have a ptype attribute. Something like:

if (rowwise && vctrs::vec_is_list(column)) {
  if (size == 0) {
    SEXP ptype = PROTECT(Rf_getAttrib(column, Rf_install("ptype")));
    column = PROTECT(Rf_allocVector(VECSXP, 1));
    if (ptype != R_NilValue) {
      SET_VECTOR_ELT(column, 0, ptype);
    }
    SET_PRCODE(prom, column);
    UNPROTECT(2);
  } else {
    SET_PRCODE(prom, column);
  }
} else {
  SET_PRCODE(prom, Rf_lang3(dplyr::functions::vec_chop, column, rows));
}

That would work ok in the previous example because NULL + 1 = numeric().

Here is what it would look like on another example that uses list-of too:

library(dplyr)
library(vctrs)

df <- tibble(x = list(c("abc", "de"), c("ab", "cd", "ef")))
df <- rowwise(df)

mutate(df, y = paste0(x, collapse = ","))
#> # A tibble: 2 × 2
#> # Rowwise: 
#>   x         y       
#>   <list>    <chr>   
#> 1 <chr [2]> abc,de  
#> 2 <chr [3]> ab,cd,ef

df0 <- df[0,]

# `NULL` works pretty well most of the time. `paste0(NULL) == character()`
mutate(df0, y = paste0(x, collapse = ","))
#> # A tibble: 0 × 2
#> # Rowwise: 
#> # … with 2 variables: x <list>, y <chr>

# You can make it fail
mutate(df0, y = {stopifnot(is.character(x)); TRUE})
#> Error in `mutate()`:
#> ! Problem while computing `y = { ... }`.
#> Caused by error in `stopifnot()`:
#> ! is.character(x) is not TRUE

# Make it a list-of
df$x <- as_list_of(df$x)
df0 <- df[0,]
df0
#> # A tibble: 0 × 1
#> # Rowwise: 
#> # … with 1 variable: x <list<chr>>

# Now it works
mutate(df0, y = {stopifnot(is.character(x)); TRUE})
#> # A tibble: 0 × 2
#> # Rowwise: 
#> # … with 2 variables: x <list<chr>>, y <lgl>

^{Created on 2022-06-19 by the reprex package (v2.0.1)}

Fixes #6303

hadley added bug an unexpected problem or unintended behavior each-row ↕️ labels Jul 21, 2022

hadley added a commit that referenced this issue Jul 26, 2022

Better handling of zero-row rowwise mutates()

21de962

Fixes #6303

hadley mentioned this issue Jul 26, 2022

Better handling of zero-row rowwise mutates() #6369

Merged

hadley closed this as completed in #6369 Aug 2, 2022

hadley closed this as completed in b41c8bb Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`rowwise()` size zero data frame edge case #6303

`rowwise()` size zero data frame edge case #6303

DavisVaughan commented Jun 19, 2022

DavisVaughan commented Jun 19, 2022

rowwise() size zero data frame edge case #6303

rowwise() size zero data frame edge case #6303

Comments

DavisVaughan commented Jun 19, 2022

DavisVaughan commented Jun 19, 2022

`rowwise()` size zero data frame edge case #6303

`rowwise()` size zero data frame edge case #6303