Skip to content

Conversation

@Anirban166
Copy link
Member

Closes #6612

…links directing to the corresponding vignettes for datatable-intro.Rmd
…links directing to the corresponding vignettes + made small grammatical changes for datatable-secondary-indices-and-auto-indexing.Rmd
…links directing to the corresponding vignettes for datatable-joins.Rmd
…links directing to the corresponding vignettes for datatable-keys-fast-subset.Rmd
…links directing to the corresponding vignettes for datatable-reference-semantics.Rmd
…responding vignette for datatable-sd-usage.Rmd
@codecov
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.60%. Comparing base (03c647f) to head (daf7ba9).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6617   +/-   ##
=======================================
  Coverage   98.60%   98.60%           
=======================================
  Files          79       79           
  Lines       14518    14518           
=======================================
  Hits        14316    14316           
  Misses        202      202           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@Anirban166
Copy link
Member Author

Two things that I can incorporate into this PR which I think are related or I hope not off-topic: (but need feedback/opinions)

A) Removal of the mentions for the non-existent(?) 'design' vignette, e.g.:

However we could improve this functionality further by *shallow* copying instead of *deep* copying. In fact, we would very much like to [provide this functionality for `v1.9.8`](https://github.com/Rdatatable/data.table/issues/617). We will touch up on this again in the *data.table design* vignette.
We could have accomplished the same operation by doing `nrow(flights[origin == "JFK" & month == 6L])`. However, it would have to subset the entire `data.table` first corresponding to the *row indices* in `i` *and then* return the rows using `nrow()`, which is unnecessary and inefficient. We will cover this and other optimisation aspects in detail under the *`data.table` design* vignette.

B) Changing items of the form text: link to be hyperlinks, or of the format [text](link), e.g.:

## Reference
- *Understanding data.table Rolling Joins*: https://www.r-bloggers.com/2016/06/understanding-data-table-rolling-joins/
- *Semi-join with data.table*: https://stackoverflow.com/questions/18969420/perform-a-semi-join-with-data-table
- *Cross join with data.table*: https://stackoverflow.com/questions/10600060/how-to-do-cross-join-in-r
- *How does one do a full join using data.table?*: https://stackoverflow.com/questions/15170741/how-does-one-do-a-full-join-using-data-table
- *Enhanced data.frame*: https://rdatatable.gitlab.io/data.table/reference/data.table.html

## Reference

- - *Understanding data.table Rolling Joins*: https://www.r-bloggers.com/2016/06/understanding-data-table-rolling-joins/ 
+ - [*Understanding data.table Rolling Joins*](https://www.r-bloggers.com/2016/06/understanding-data-table-rolling-joins/) 
  
- - *Semi-join with data.table*: https://stackoverflow.com/questions/18969420/perform-a-semi-join-with-data-table
+ - [*Semi-join with data.table*](https://stackoverflow.com/questions/18969420/perform-a-semi-join-with-data-table) 
  
- - *Cross join with data.table*: https://stackoverflow.com/questions/10600060/how-to-do-cross-join-in-r 
+ - [*Cross join with data.table*](https://stackoverflow.com/questions/10600060/how-to-do-cross-join-in-r)

- - *How does one do a full join using data.table?*: https://stackoverflow.com/questions/15170741/how-does-one-do-a-full-join-using-data-table   
+ - [*How does one do a full join using data.table?*](https://stackoverflow.com/questions/15170741/how-does-one-do-a-full-join-using-data-table)

- - *Enhanced data.frame*: https://rdatatable.gitlab.io/data.table/reference/data.table.html   
+ - [*Enhanced data.frame*](https://rdatatable.gitlab.io/data.table/reference/data.table.html)

@Anirban166
Copy link
Member Author

Also, do we need to ping @Rdatatable/translators or the translation teams for this update? @tdhock

**Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an `attribute` called `sorted`.

We'll learn more about `keys` in the `vignette("datatable-keys-fast-subset", package="data.table")`; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`.
We'll learn more about `keys` in the [*Keys and fast binary search based subset*](datatable-keys-fast-subset.html) vignette; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to address off-line use, #6612 (comment)
can this be changed from

[*Keys and fast binary search based subset*](datatable-keys-fast-subset.html)

to

[`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html)

which renders as vignette("datatable-keys-fast-subset", package="data.table")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, but what do you suggest for rephrasing the text around it? or should it stay the same? (it reads a bit odd, but it might just be me)

For e.g., in the case you highlighted above, it currently reads as '... in the Keys and fast binary search based subset vignette ...' so should I change it to:
i) ' ... in vignette("datatable-keys-fast-subset", package="data.table") ...' (discarding the word 'vignette' for repetition)
ii) or ' ... in the vignette("datatable-keys-fast-subset", package="data.table") ...' (sounding more natural as in 'the vignette')
iii) or simply ' ... in the vignette("datatable-keys-fast-subset", package="data.table") vignette ...'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever is easiest is fine with me

if you have time to make it sound natural, that is great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think the current version looks fine and does a good job in considering both points (hyperlinks, off-line use) so I'm cool with it (let me know if you need any other changes)

@tdhock
Copy link
Member

tdhock commented Nov 20, 2024

great thanx

@tdhock tdhock merged commit 546259d into master Nov 20, 2024
@tdhock
Copy link
Member

tdhock commented Nov 20, 2024

here is some code I used to check that the text and link are consistent,

vignette.line.vec <- c(
'We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")` and the `vignette("datatable-keys-fast-subset", package="data.table")`.',
'We have seen this example already in the vignettes [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html).',
'We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, `vignette("datatable-joins", package="data.table")`.',
'We will discuss fast *subsets* using keys and secondary indices to *joins* in the [next vignette (`vignette("datatable-joins", package="data.table")`)](datatable-joins.html).')
  
Rmd.line.list <- list()
for(f in Sys.glob("*.Rmd")){
  Rmd.line.list[[f]] <- readLines(f)
}
Rmd.line.vec <- do.call(c, Rmd.line.list)
vignette.line.vec <- grep("vignette(",Rmd.line.vec,fixed=TRUE,value=TRUE)

nc::capture_all_str(
  vignette.line.vec,
  '\\[',
  text=".*?",
  '\\]\\(',
  vig_name='datatable-.*?',
  '.html'
)[
, pattern := sprintf('`vignette("%s", package="data.table")`', vig_name)
][
, .(expected=grepl(
  pattern,
  text, fixed=TRUE)
  ),
  by=.(text,pattern)
]

if links are consistent then we get TRUE in expected column.

@Anirban166
Copy link
Member Author

Thanks for sharing, neat stuff with nc!

@MichaelChirico MichaelChirico deleted the hyperlink-vignettes branch December 6, 2024 06:43
@MichaelChirico
Copy link
Member

This is nice! I may have missed something -- is there a plan to prevent regression here (i.e., any future vignettes/updates to vignettes which generate new references to other vignettes, which do not wind up linked in this way)?

@tdhock
Copy link
Member

tdhock commented Dec 6, 2024

good idea, is that something we can enforce via lint?

@MichaelChirico
Copy link
Member

Yea, we can probably do something for most cases with a simple regex (probably in the md-lint job) check for vignette(..., package="data.table") that's not part of a markdown link.

The case changed here for datatable-intro is much harder though, maybe an LLM could find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vignette() function is not interpreted

3 participants