Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,9 @@

20. `uniqueN(DT, by=character())` is now equivalent to `uniqueN(DT)` rather than internal error `'by' is either not integer or is length 0`, [#4594](https://github.com/Rdatatable/data.table/issues/4594). Thanks Marco Colombo for the report, and Michael Chirico for the PR. Similarly for `unique()`, `duplicated()` and `anyDuplicated()`.

21. `melt()` on a `data.table` with `list` columns for `measure.vars` would silently ignore `na.rm=TRUE`, [#5044](https://github.com/Rdatatable/data.table/issues/5044). Now the same logic as `is.na()` from base R is used; i.e. if list element is scalar NA then it is considered missing and removed. Thanks to Toby Dylan Hocking for the PRs.


## NOTES

1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :
Expand Down
9 changes: 8 additions & 1 deletion inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -3060,6 +3060,13 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x))
# na.rm=TRUE with list column value, PR#4737
test(1035.016, melt(data.table(a1=1, b1=list(1:2), b2=list(c('foo','bar'))), na.rm=TRUE, measure.vars=list(a="a1", b=c("b1","b2"))), data.table(variable=factor(1), a=1, b=list(1:2)))
test(1035.017, melt(data.table(a1=1, b1=1, b2=2), na.rm=TRUE, measure.vars=list(a="a1", b=c("b1","b2"))), data.table(variable=factor(1), a=1, b=1))#this worked even before the PR.
DT.list.missing = data.table(l1=list(1,NA), l2=list(NA,2), n34=c(3,4), NA5=c(NA,5))
test(1035.0180, melt(DT.list.missing, measure.vars=c("n34","NA5"), na.rm=TRUE)[["value"]], c(3,4,5))
test(1035.0181, melt(DT.list.missing, measure.vars=c("l1","l2"), na.rm=TRUE)[["value"]], list(1,2))
test(1035.0182, melt(DT.list.missing, measure.vars=c("l1","n34"), na.rm=TRUE)[["value"]], list(1,3,4), warning="are not all of the same type")
test(1035.0183, melt(DT.list.missing, measure.vars=c("l1","NA5"), na.rm=TRUE)[["value"]], list(1,5), warning="are not all of the same type")
test(1035.0184, melt(DT.list.missing, measure.vars=list(l=c("l1","l2"), n=c("n34","NA5")), na.rm=TRUE), data.table(variable=factor(1:2), l=list(1,2), n=c(3,5)))
test(1035.0185, melt(data.table(l=list(c(NA,NA), NA, NA_integer_, NA_real_, NA_complex_, NA_character_, if(test_bit64)NA_integer64_)), measure.vars="l", na.rm=TRUE)[["value"]], list(c(NA,NA)))

ans1 = cbind(DT[, c(1,2,8), with=FALSE], variable=factor("l_1"))
ans1[, value := DT$l_1]
Expand Down Expand Up @@ -6514,7 +6521,7 @@ test(1459.12, .Call("CsubsetDT", DT, 5L, seq_along(DT)), setDT(as.data.frame(DT)

# Test for na.omit with list, raw and complex types
DT = data.table(x=c(1L,1L,NA), y=c(NA, NA, 1), z=as.raw(1:3), w=list(1,NA,2), v=c(1+5i, NA, NA))
test(1460.1, na.omit(DT, cols="w"), DT)
test(1460.1, na.omit(DT, cols="w"), DT[c(1,3)])
test(1460.2, na.omit(DT, cols="v"), DT[1])
test(1460.3, na.omit(DT, cols=c("v", "y")), DT[0])
test(1460.4, na.omit(DT, cols=c("z", "v")), DT[1])
Expand Down
7 changes: 5 additions & 2 deletions man/melt.data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,11 @@ melt(DT, id.vars=1:2, measure.vars=patterns(f="^f_", d="^d_"), value.factor=TRUE
# na.rm=TRUE removes rows with NAs in any 'value' columns
melt(DT, id.vars=1:2, measure.vars=patterns("f_", "d_"), value.factor=TRUE, na.rm=TRUE)

# return 'NA' for missing columns, 'na.rm=TRUE' ignored due to list column
melt(DT, id.vars=1:2, measure.vars=patterns("l_", "c_"), na.rm=TRUE)
# 'na.rm=TRUE' also works with list column, but note that is.na only
# returns TRUE if the list element is a length=1 vector with an NA.
is.na(list(one.NA=NA, two.NA=c(NA,NA)))
melt(DT, id.vars=1:2, measure.vars=patterns("l_", "d_"), na.rm=FALSE)
melt(DT, id.vars=1:2, measure.vars=patterns("l_", "d_"), na.rm=TRUE)

# measure list with missing/short entries results in output with runs of NA
DT.missing.cols <- DT[, .(d_1, d_2, c_1, f_2)]
Expand Down
18 changes: 15 additions & 3 deletions src/fmelt.c
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,7 @@ SEXP getvaluecols(SEXP DT, SEXP dtnames, Rboolean valfactor, Rboolean verbose, s
for (int k=0; k<data->nrow; ++k) SET_STRING_ELT(target, j*data->nrow + k, STRING_ELT(thiscol, k));
}
break;
//TODO complex value type: case CPLXSXP: { } break;
case REALSXP : {
double *dtarget = REAL(target);
const double *dthiscol = REAL(thiscol);
Expand Down Expand Up @@ -729,10 +730,21 @@ SEXP getidcols(SEXP DT, SEXP dtnames, Rboolean verbose, struct processData *data
}
break;
case VECSXP : {
for (int j=0; j<data->lmax; ++j) {
for (int k=0; k<data->nrow; ++k) {
SET_VECTOR_ELT(target, j*data->nrow + k, VECTOR_ELT(thiscol, k));
if (data->narm) {
for (int j=0; j<data->lmax; ++j) {
SEXP thisidx = VECTOR_ELT(data->naidx, j);
const int *ithisidx = INTEGER(thisidx);
const int thislen = length(thisidx);
for (int k=0; k<thislen; ++k)
SET_VECTOR_ELT(target, counter + k, VECTOR_ELT(thiscol, ithisidx[k]-1));
counter += thislen;
}
} else {
for (int j=0; j<data->lmax; ++j) {
for (int k=0; k<data->nrow; ++k) {
SET_VECTOR_ELT(target, j*data->nrow + k, VECTOR_ELT(thiscol, k));
}
}
}
}
break;
Expand Down
48 changes: 43 additions & 5 deletions src/frank.c
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ SEXP dt_na(SEXP x, SEXP cols) {
for (int i=0; i<n; ++i) ians[i]=0;
for (int i=0; i<LENGTH(cols); ++i) {
SEXP v = VECTOR_ELT(x, INTEGER(cols)[i]-1);
if (!length(v) || isNewList(v) || isList(v)) continue; // like stats:::na.omit.data.frame, skip list/pairlist columns
if (!length(v) || isList(v)) continue; // like stats:::na.omit.data.frame, skip pairlist columns
if (n != length(v))
error(_("Column %d of input list x is length %d, inconsistent with first column of that item which is length %d."), i+1,length(v),n);
switch (TYPEOF(v)) {
Expand All @@ -39,12 +39,11 @@ SEXP dt_na(SEXP x, SEXP cols) {
}
break;
case REALSXP: {
const double *dv = REAL(v);
if (INHERITS(v, char_integer64)) {
for (int j=0; j<n; ++j) {
ians[j] |= (DtoLL(dv[j]) == NA_INT64_LL); // TODO: can be == NA_INT64_D directly
}
const int64_t *dv = (int64_t *)REAL(v);
for (int j=0; j<n; ++j) ians[j] |= (dv[j] == NA_INTEGER64);
} else {
const double *dv = REAL(v);
for (int j=0; j<n; ++j) ians[j] |= ISNAN(dv[j]);
}
}
Expand All @@ -59,6 +58,45 @@ SEXP dt_na(SEXP x, SEXP cols) {
for (int j=0; j<n; ++j) ians[j] |= (ISNAN(COMPLEX(v)[j].r) || ISNAN(COMPLEX(v)[j].i));
}
break;
case VECSXP: {
// is.na(some_list) returns TRUE only for elements which are
// scalar NA.
for (int j=0; j<n; ++j) {
SEXP list_element = VECTOR_ELT(v, j);
switch (TYPEOF(list_element)) {
case LGLSXP: {
ians[j] |= (length(list_element)==1 && LOGICAL(list_element)[0] == NA_LOGICAL);
}
break;
case INTSXP: {
ians[j] |= (length(list_element)==1 && INTEGER(list_element)[0] == NA_INTEGER);
}
break;
case STRSXP: {
ians[j] |= (length(list_element)==1 && STRING_ELT(list_element,0) == NA_STRING);
}
break;
case CPLXSXP: {
if (length(list_element)==1) {
Rcomplex first_complex = COMPLEX(list_element)[0];
ians[j] |= (ISNAN(first_complex.r) || ISNAN(first_complex.i));
}
}
break;
case REALSXP: {
if (length(list_element)==1) {
if (INHERITS(list_element, char_integer64)) {
ians[j] |= ((const int64_t *)REAL(list_element))[0] == NA_INTEGER64;
} else {
ians[j] |= ISNAN(REAL(list_element)[0]);
}
}
}
break;
}
}
}
break;
default:
error(_("Unsupported column type '%s'"), type2char(TYPEOF(v)));
}
Expand Down