Add LineCache to optimize newlines lookup#352
Add LineCache to optimize newlines lookup#352lhecker merged 5 commits intomicrosoft:mainfrom L1quidH2O:lines-opt
LineCache to optimize newlines lookup#352Conversation
|
@microsoft-github-policy-service agree |
|
I explicitly decided against the inclusion of a line cache because forgetting to invalidate caches can be moderately dangerous (if you forget it / do it incorrectly) and can be slow in edge cases if it's not written well. The editor was not meant to be used for very large files as specialized editors are better suited for this IMO. We should focus on providing an excellent editor for regular files first. We do need a line cache eventually for syntax highlighting though to cache the highlighter state very 1000 lines or so. So, despite everything I said, long-term this PR is very welcome. But I'm somewhat skeptical of this PR in the current state for two reasons:
So, I wonder if we should not repurpose this PR for a syntax highlighter. |
src/buffer/line_cache.rs
Outdated
| const CACHE_EVERY: usize = 1024 * 64; | ||
|
|
||
| pub struct LineCache { | ||
| cache: Vec<(usize, usize)>, // (index, line) |
There was a problem hiding this comment.
Instead of storing indices/lines, ideally, we would store index/line deltas between items. During updates, we then only need to update the items that are affected by the change, but not all items past it.
src/buffer/line_cache.rs
Outdated
| let (ref mut off, ref mut line) = self.cache[i]; | ||
| if *off > range.start { | ||
| if *off < range.end { | ||
| self.cache.remove(i); // cache point is within the deleted range |
There was a problem hiding this comment.
This is O(n^2) worst case if the user deletes the contents between the start of the file and the middle of it, if I'm reading this right. It would be better for us to first find the offset of the range.start item, then find the offset of the range.end item, and then delete all items in that range in a single .drain() call.
src/buffer/line_cache.rs
Outdated
| for &mut (ref mut off, ref mut line) in &mut self.cache { | ||
| if *off > offset { | ||
| *off += len; | ||
| *line += newlines; | ||
| } | ||
| } |
There was a problem hiding this comment.
This also needs to insert new items, basically in reverse to what delete did.
src/buffer/line_cache.rs
Outdated
| if self.cache[i].1 >= target_count { | ||
| let ind = if !reverse { | ||
| if i == 0 { return None; } // No previous line exists | ||
| i - 1 | ||
| } else { | ||
| if i == len - 1 { return None; } // No next line exists | ||
| i | ||
| }; | ||
| return Some(self.cache[ind]); | ||
| } | ||
| i += 1; |
There was a problem hiding this comment.
I believe this should ideally be a binary search, right?
If we use deltas as I proposed above, we could note the last deletion/insertion/nearest_offset position and its absolute index and cache it in a member. Then we can start iterating from there. Modifications and searches should occur near each other, after all.
|
I forgot to mention, I had a solution to this performance issue in mind and if you're interested in SIMD you could implement it. (I was planning to do that myself at some point.) I'm quite certain that it would solve this performance issue as well, even without a line cache: edit/src/unicode/measurement.rs Lines 532 to 538 in a82483d Since I wrote that, I realized that we could also just always use HADD unconditionally. Edit: I started working on this because it made me very curious. 😅 |
|
I finished implementing a first prototype: https://github.com/microsoft/edit/tree/dev/lhecker/optimized-line-seeking FYI you can check the latency with the |
|
Wow the SIMD runs just as fast! that's seriously impressive work |
- Added binary search
|
Since we now have the SIMD performance improvements (they're now another 5x faster than what you tested!), I've updated your PR to not compile the line cache. We will still have great use for it in the future, however, for syntax highlighting. Thank you for your work! |
LineCache to optimize newlines lookup for faster scrollingLineCache to optimize newlines lookup
This code was originally written for speeding up line searches but was disabled since we've since optimized line seeking with SIMD. We'll still have use for this code in the future, however, to cache syntax highlighter state every N lines.
Scrolling is incredibly slow with large files because newlines aren't cached.
This PR introduces a simplified newline cache that reduces the need for repeated large scans during scrolling.
LineCachestruct that stores(offset, line)pairs every 64*1024 lines.nearest_offset()to retrieve the closest cached line.Ideally in the future, newline tracking would be built into a rope tree. But since this project uses a gap buffer, this is a fast enough shortcut.
Slow Demo:
slow.mp4
Fast Demo:
fast.mp4