Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ exportClasses(data.table, IDate, ITime)

export(data.table, tables, setkey, setkeyv, key, "key<-", haskey, CJ, SJ, copy)
export(setindex, setindexv, indices)
export(set2key, set2keyv, key2) # deprecated with helpful error; remove after May 2019 (see #3399)
export(as.data.table,is.data.table,test.data.table)
export(last,first,like,"%like%","%ilike%","%flike%",between,"%between%",inrange,"%inrange%")
export(timetaken)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

1. `as.IDate`, `as.ITime`, `second`, `minute`, and `hour` now recognize UTC equivalents for speed: GMT, GMT-0, GMT+0, GMT0, Etc/GMT, and Etc/UTC, [#4116](https://github.com/Rdatatable/data.table/issues/4116).

2. `set2key`, `set2keyv`, and `key2` have been removed, as they have been warning since v1.9.8 (Nov 2016) and halting with helpful message since v1.11.0 (May 2018). When they were introduced in version 1.9.4 (Oct 2014) they were marked as 'experimental' and quickly superceded by `setindex` and `indices`.


# data.table [v1.12.8](https://github.com/Rdatatable/data.table/milestone/15?closed=1) (09 Dec 2019)

Expand Down
5 changes: 0 additions & 5 deletions R/setkey.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,6 @@ setindexv = function(x, cols, verbose=getOption("datatable.verbose")) {
}
}

# remove these 3 after May 2019; see discussion in #3399 and notes in v1.12.2. They were marked experimental after all.
set2key = function(...) stop("set2key() is now deprecated. Please use setindex() instead.")
set2keyv = function(...) stop("set2keyv() is now deprecated. Please use setindexv() instead.")
key2 = function(...) stop("key2() is now deprecated. Please use indices() instead.")

# upgrade to error after Mar 2020. Has already been warning since 2012, and stronger warning in Mar 2019 (note in news for 1.12.2); #3399
"key<-" = function(x,value) {
warning("key(x)<-value is deprecated and not supported. Please change to use setkey() with perhaps copy(). Has been warning since 2012 and will be an error in future.")
Expand Down
4 changes: 1 addition & 3 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -12454,9 +12454,7 @@ test(1897.2, attributes(attr(DT, 'index')),
list(`__a` = c(3L, 2L, 4L, 1L, 5L),
`__a__b` = c(3L, 4L, 2L, 1L, 5L)))

test(1898.1, set2key(DT, a), error="deprecated. Please use setindex() instead.")
test(1898.2, set2keyv(DT, "a"), error="deprecated. Please use setindexv() instead.")
test(1898.3, key2(DT), error="deprecated. Please use indices() instead.")
# tests 1898.{1,2,3} for set2key etc. deprecation were removed along with those functions

# Allow column to be used as rownames when converting to matrix #2702
DT = data.table(id = letters[1:4], X = 1:4, Y = 5:8)
Expand Down
13 changes: 13 additions & 0 deletions man/deprecated.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
\name{key<-}
\alias{key<-}
\title{ Deprecated. }
\keyword{internal}
\description{
This function is deprecated. It will be removed in future. Please use \code{\link{setkey}}.
}
\usage{
key(x) <- value # warning since 2012; DEPRECATED since Mar 2019
}
\arguments{
\item{x}{ Deprecated. }
}
22 changes: 0 additions & 22 deletions man/set2key.Rd

This file was deleted.

30 changes: 15 additions & 15 deletions vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Secondary indices and auto indexing"
date: "`r Sys.Date()`"
output:
output:
rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Secondary indices and auto indexing}
Expand Down Expand Up @@ -73,10 +73,10 @@ names(attributes(flights))
```

* `setindex` and `setindexv()` allows adding a secondary index to the data.table.
* Originally it was `set2key` until data.table 1.9.6, then [changed to current names](https://github.com/Rdatatable/data.table/issues/1442).

* Note that `flights` is **not** physically reordered in increasing order of `origin`, as would have been the case with `setkey()`.

* Also note that the attribute `index` has been added to `flights`.
* Also note that the attribute `index` has been added to `flights`.

* `setindex(flights, NULL)` would remove all secondary indices.

Expand Down Expand Up @@ -111,23 +111,23 @@ a) computing the order vector for the column(s) provided, here, `origin`, and

b) reordering the entire data.table, by reference, based on the order vector computed.

#
#

Computing the order isn't the time consuming part, since data.table uses true radix sorting on integer, character and numeric vectors. However reordering the data.table could be time consuming (depending on the number of rows and columns).
Computing the order isn't the time consuming part, since data.table uses true radix sorting on integer, character and numeric vectors. However reordering the data.table could be time consuming (depending on the number of rows and columns).

Unless our task involves repeated subsetting on the same column, fast key based subsetting could effectively be nullified by the time to reorder, depending on our data.table dimensions.

#### -- There can be only one `key` at the most

Now if we would like to repeat the same operation but on `dest` column instead, for the value "LAX", then we have to `setkey()`, *again*.
Now if we would like to repeat the same operation but on `dest` column instead, for the value "LAX", then we have to `setkey()`, *again*.

```{r, eval = FALSE}
## not run
setkey(flights, dest)
flights["LAX"]
```

And this reorders `flights` by `dest`, *again*. What we would really like is to be able to perform the fast subsetting by eliminating the reordering step.
And this reorders `flights` by `dest`, *again*. What we would really like is to be able to perform the fast subsetting by eliminating the reordering step.

And this is precisely what *secondary indices* allow for!

Expand All @@ -145,11 +145,11 @@ As we will see in the next section, the `on` argument provides several advantage

* allows easy reuse of existing indices by just checking the attributes.

* allows for a cleaner syntax by having the columns on which the subset is performed as part of the syntax. This makes the code easier to follow when looking at it at a later point.
* allows for a cleaner syntax by having the columns on which the subset is performed as part of the syntax. This makes the code easier to follow when looking at it at a later point.

Note that `on` argument can also be used on keyed subsets as well. In fact, we encourage to provide the `on` argument even when subsetting using keys for better readability.

#
#

## 2. Fast subsetting using `on` argument and secondary indices

Expand All @@ -161,7 +161,7 @@ As we will see in the next section, the `on` argument provides several advantage
flights["JFK", on = "origin"]

## alternatively
# flights[.("JFK"), on = "origin"] (or)
# flights[.("JFK"), on = "origin"] (or)
# flights[list("JFK"), on = "origin"]
```

Expand Down Expand Up @@ -276,9 +276,9 @@ flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", on = c("origin", "dest"

## 3. Auto indexing

First we looked at how to fast subset using binary search using *keys*. Then we figured out that we could improve performance even further and have more cleaner syntax by using secondary indices.
First we looked at how to fast subset using binary search using *keys*. Then we figured out that we could improve performance even further and have more cleaner syntax by using secondary indices.

That is what *auto indexing* does. At the moment, it is only implemented for binary operators `==` and `%in%`. An index is automatically created *and* saved as an attribute. That is, unlike the `on` argument which computes the index on the fly each time (unless one already exists), a secondary index is created here.
That is what *auto indexing* does. At the moment, it is only implemented for binary operators `==` and `%in%`. An index is automatically created *and* saved as an attribute. That is, unlike the `on` argument which computes the index on the fly each time (unless one already exists), a secondary index is created here.

Let's start by creating a data.table big enough to highlight the advantage.

Expand Down Expand Up @@ -312,15 +312,15 @@ The time to subset the first time is the time to create the index + the time to
system.time(dt[x %in% 1989:2012])
```

* Running the first time took `r sprintf("%.3f", t1["elapsed"])` seconds where as the second time took `r sprintf("%.3f", t2["elapsed"])` seconds.
* Running the first time took `r sprintf("%.3f", t1["elapsed"])` seconds where as the second time took `r sprintf("%.3f", t2["elapsed"])` seconds.

* Auto indexing can be disabled by setting the global argument `options(datatable.auto.index = FALSE)`.

* Disabling auto indexing still allows to use indices created explicitly with `setindex` or `setindexv`. You can disable indices fully by setting global argument `options(datatable.use.index = FALSE)`.

#
#

In recent version we extended auto indexing to expressions involving more than one column (combined with `&` operator). In the future, we plan to extend binary search to work with more binary operators like `<`, `<=`, `>` and `>=`.
In recent version we extended auto indexing to expressions involving more than one column (combined with `&` operator). In the future, we plan to extend binary search to work with more binary operators like `<`, `<=`, `>` and `>=`.

We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, *"Joins and rolling joins"*.

Expand Down