Skip to content

Rbindlist retains attributes#5570

Draft
ben-schwen wants to merge 9 commits intomasterfrom
rbindlist_retainAttributes
Draft

Rbindlist retains attributes#5570
ben-schwen wants to merge 9 commits intomasterfrom
rbindlist_retainAttributes

Conversation

@ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Dec 21, 2022

Closes #5569

Ofc we could still discuss if we want to retain all attributes of all inputs. (won't work with keys)

Add options how to handle retainment, e.g.
retain.options:

  • all
  • drop
  • first
  • new argument: exclude=c("key", "index)

@codecov
Copy link

codecov bot commented Dec 21, 2022

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (34e02f2) 97.46% compared to head (10d986e) 97.46%.

❗ Current head 10d986e differs from pull request most recent head 4774bd0. Consider uploading reports for the commit 4774bd0 to get more accurate results

Files Patch % Lines
src/rbindlist.c 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5570      +/-   ##
==========================================
- Coverage   97.46%   97.46%   -0.01%     
==========================================
  Files          80       80              
  Lines       14822    14833      +11     
==========================================
+ Hits        14447    14457      +10     
- Misses        375      376       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jangorecki
Copy link
Member

Are data.table attributes retained or columns attributes as well?
How about binding attributes with c()? I can imagine that preserving attributes of the first DT won't always be much useful.

@ben-schwen
Copy link
Member Author

Are data.table attributes retained or columns attributes as well? How about binding attributes with c()? I can imagine that preserving attributes of the first DT won't always be much useful.

Currently, rbindlist always retains column attributes from the first appearance of a column, which is usually the first input of the list, but can also be a later one if using fill=TRUE.

I also thought about binding the attributes from all inputs with c() but this won't work since we also store keys and indices as attributes.

For example rbind(x,y) where x and y both have the same key "a" then binding would result in the key c("a","a"). Keys are even more problematic if we use use.names=FALSE because later keys or indices do not have to appear in the result data.table.

@jangorecki
Copy link
Member

jangorecki commented Dec 19, 2023

I would prefer to spent more time on design of this feature to avoid breaking changes later on.
New argument could possibly accept a function, then first() would just take column/table attributes of a first DT, while c() would combine column/table attributes from all objects. Then another switch is to control behavior for column and table separately. Then another factor is to exclude particular attributes of a column/table from being retained, and here good defaults would be key and index.
Not sure if we want to rush just to have some implementation done.

I recall an issue asking for dropping all attributes as well, another one about attributes of spatial class data frames.
Would be nice to gather all related issues and try to design interface that will cover all of them.

I also thought about binding the attributes from all inputs with c() but this won't work since we also store keys and indices as attributes.

unless user rbind sorted partitions in sorted order, then key could actually be preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

I had expected rbindlist() to preserve attributes, but it doesn't. It'd be a nice feature; or an important limitation to disclose in the documentation.

2 participants