Skip to content

Unexpected result from non-equi-join #4949

@renkun-ken

Description

@renkun-ken

Suppose we have the following data.tables:

library(data.table)

p1 <- data.table(
  symbol = c("A", "A", "A", "B", "B", "B"),
  date = c(1L, 2L, 3L, 1L, 2L, 3L)
)

p2 <- data.table(
  symbol = c("A", "A", "B"),
  start_date = c(1L, 2L, 1L),
  end_date = c(1L, 3L, 2L),
  id = 1:3
)
> p1                                                                                                                                  
   symbol  date
   <char> <int>
1:      A     1
2:      A     2
3:      A     3
4:      B     1
5:      B     2
6:      B     3

> p2                                                                                                                                  
   symbol start_date end_date    id
   <char>      <int>    <int> <int>
1:      A          1        1     1
2:      A          2        3     2
3:      B          1        2     3

In-place modification with non-equi-join works as expected:

p1[p2, id := id, on = .(symbol, date >= start_date, date <= end_date)]
> p1                                                                                                                                  
   symbol  date    id
   <char> <int> <int>
1:      A     1     1
2:      A     2     2
3:      A     3     2
4:      B     1     3
5:      B     2     3
6:      B     3    NA

However, using the non-equi-join syntax to subset p1 with p2 does not seem to work consistently:

p1[p2, .(symbol, date), on = .(symbol, date >= start_date, date <= end_date)]
   symbol  date
   <char> <int>
1:      A     1
2:      A     2
3:      A     2
4:      B     1
5:      B     1

or

p1[p2, on = .(symbol, date >= start_date, date <= end_date)]
   symbol  date    id date.1  i.id
   <char> <int> <int>  <int> <int>
1:      A     1     1      1     1
2:      A     2     2      3     2
3:      A     2     2      3     2
4:      B     1     3      2     3
5:      B     1     3      2     3

But I expect the result to be like

p2 <- p1[!is.na(id)]
p2
   symbol  date    id
   <char> <int> <int>
1:      A     1     1
2:      A     2     2
3:      A     3     2
4:      B     1     3
5:      B     2     3

I wonder if this is a known behavior? Or is there an explanation why it is not the expected result like above? Am I missing something about the definition of the non-equi-join?

Metadata

Metadata

Assignees

No one assigned

    Labels

    non-equi joinsrolling, overlapping, non-equi joins

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions