Use child columns of a tibble column for joins #7659

vorpalvorpal · 2025-02-18T09:42:11Z

If I have a tibble containing a tibble column, it would be useful to be able to join based on the contents of a column within the tibble column.

df <- tibble(id = 1:5, tibbleCol = tibble(id = 2:6, value = 6:2), value = 3:7)
df2 <- tibble(id = 5:1, value = 8:4)
left_join(df, df2, by = c("tibbleCol$id" = "id"))
# Error in `left_join()`:
# ! Join columns in `x` must be present in the data.
# ✖ Problem with `tibbleCol$id`.

The above currently fails, but there is no obvious reason that it should do so, since the columns within a tibble column (unlike list columns or nested columns) are inherently the same length as the parent tibble.

pstils · 2025-02-19T10:25:54Z

Does a mutate do what you want?

df |> 
    mutate(tibbleCol = left_join(tibbleCol, df2, by = c(id = "id")))

vorpalvorpal · 2025-02-20T03:38:09Z

That's not quite what I was getting at. That joins columns onto the second level dataframe, I was talking about joining on the top level data frame.

> df |> 
+     mutate(tibbleCol = left_join(tibbleCol, df2, by = c(id = "id")))
# A tibble: 5 × 3
     id tibbleCol$id $value.x $value.y value
  <int>        <int>    <int>    <int> <int>
1     1            2        6        5     3
2     2            3        5        6     4
3     3            4        4        7     5
4     4            5        3        8     6
5     5            6        2       NA     7

# What I'm aiming for the equivalent of:
> df |> 
+     tidyr::unpack(cols = tibbleCol, names_sep = "_") |>
+     left_join(df2, by = c("tibbleCol_id" = "id")) |> 
+     tidyr::pack(tibbleCol = starts_with("tibbleCol"), .names_sep = "_")
# A tibble: 5 × 4
     id value.x value.y tibbleCol$id $value
  <int>   <int>   <int>        <int>  <int>
1     1       3       5            2      6
2     2       4       6            3      5
3     3       5       7            4      4
4     4       6       8            5      3
5     5       7      NA            6      2

I'm saying it would be nice to be able to treat a column embedded within a tibble column as if it were in the main data frame. In my use case I am basically using tibble columns to organise what is essentially a very, very, very wide data frame to make it more manageable and intelligible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use child columns of a tibble column for joins #7659

Use child columns of a tibble column for joins #7659

vorpalvorpal commented Feb 18, 2025 •

edited

Loading

pstils commented Feb 19, 2025

vorpalvorpal commented Feb 20, 2025 •

edited

Loading

Use child columns of a tibble column for joins #7659

Use child columns of a tibble column for joins #7659

Comments

vorpalvorpal commented Feb 18, 2025 • edited Loading

pstils commented Feb 19, 2025

vorpalvorpal commented Feb 20, 2025 • edited Loading

vorpalvorpal commented Feb 18, 2025 •

edited

Loading

vorpalvorpal commented Feb 20, 2025 •

edited

Loading