Add `LineCache` to optimize newlines lookup by L1quidH2O · Pull Request #352 · microsoft/edit

L1quidH2O · 2025-05-29T13:08:22Z

Scrolling is incredibly slow with large files because newlines aren't cached.
This PR introduces a simplified newline cache that reduces the need for repeated large scans during scrolling.

Added LineCache struct that stores (offset, line) pairs every 64*1024 lines.
Very simple, doesn't build a cache unless a document is opened.
Changes in document are tracked.
Implemented nearest_offset() to retrieve the closest cached line.

Ideally in the future, newline tracking would be built into a rope tree. But since this project uses a gap buffer, this is a fast enough shortcut.

Slow Demo:

slow.mp4

Fast Demo:

fast.mp4

L1quidH2O · 2025-05-29T13:11:47Z

@microsoft-github-policy-service agree

lhecker · 2025-05-29T19:42:58Z

I explicitly decided against the inclusion of a line cache because forgetting to invalidate caches can be moderately dangerous (if you forget it / do it incorrectly) and can be slow in edge cases if it's not written well. The editor was not meant to be used for very large files as specialized editors are better suited for this IMO. We should focus on providing an excellent editor for regular files first.

We do need a line cache eventually for syntax highlighting though to cache the highlighter state very 1000 lines or so. So, despite everything I said, long-term this PR is very welcome.

But I'm somewhat skeptical of this PR in the current state for two reasons:

For syntax highlighting we need to store a snapshot every ~1000 lines or so, because the regex matching is very expensive (relatively speaking; runs at <100MB/s)
Having 2 line caches for both syntax highlighting and also scrolling may be prohibitive (binary size)
...but if we store snapshots every 1000 lines, storing snapshots for a 140M line file would result in a 140k item array

So, I wonder if we should not repurpose this PR for a syntax highlighter.

lhecker · 2025-05-29T19:44:48Z

src/buffer/line_cache.rs

+const CACHE_EVERY: usize = 1024 * 64;
+
+pub struct LineCache {
+    cache: Vec<(usize, usize)>, // (index, line)


Instead of storing indices/lines, ideally, we would store index/line deltas between items. During updates, we then only need to update the items that are affected by the change, but not all items past it.

lhecker · 2025-05-29T19:46:39Z

src/buffer/line_cache.rs

+            let (ref mut off, ref mut line) = self.cache[i];
+            if *off > range.start {
+                if *off < range.end {
+                    self.cache.remove(i); // cache point is within the deleted range


This is O(n^2) worst case if the user deletes the contents between the start of the file and the middle of it, if I'm reading this right. It would be better for us to first find the offset of the range.start item, then find the offset of the range.end item, and then delete all items in that range in a single .drain() call.

lhecker · 2025-05-29T19:47:16Z

src/buffer/line_cache.rs

+        for &mut (ref mut off, ref mut line) in &mut self.cache {
+            if *off > offset {
+                *off += len;
+                *line += newlines;
+            }
+        }


This also needs to insert new items, basically in reverse to what delete did.

lhecker · 2025-05-29T19:47:33Z

src/buffer/line_cache.rs

+            if self.cache[i].1 >= target_count {
+                let ind = if !reverse {
+                    if i == 0 { return None; } // No previous line exists
+                    i - 1
+                } else {
+                    if i == len - 1 { return None; } // No next line exists
+                    i
+                };
+                return Some(self.cache[ind]);
+            }
+            i += 1;


I believe this should ideally be a binary search, right?
If we use deltas as I proposed above, we could note the last deletion/insertion/nearest_offset position and its absolute index and cache it in a member. Then we can start iterating from there. Modifications and searches should occur near each other, after all.

lhecker · 2025-05-29T19:51:29Z

I forgot to mention, I had a solution to this performance issue in mind and if you're interested in SIMD you could implement it. (I was planning to do that myself at some point.) I'm quite certain that it would solve this performance issue as well, even without a line cache:

edit/src/unicode/measurement.rs

Lines 532 to 538 in a82483d

    
           // TODO: This code could be optimized by replacing memchr with manual line counting. 
        
           // 
        
           // If `line_stop` is very far away, we could accumulate newline counts horizontally 
        
           // in a AVX2 register (= 32 u8 slots). Then, every 256 bytes we compute the horizontal 
        
           // sum via `_mm256_sad_epu8` yielding us the newline count in the last block. 
        
           // 
        
           // We could also just use `_mm256_sad_epu8` on each fetch as-is.

Since I wrote that, I realized that we could also just always use HADD unconditionally.

Edit: I started working on this because it made me very curious. 😅

lhecker · 2025-05-29T21:54:00Z

I finished implementing a first prototype: https://github.com/microsoft/edit/tree/dev/lhecker/optimized-line-seeking
It's roughly 20x faster for me, which is neat, regardless of other design choices. 🙂 I'm quite certain it can be further optimized.

FYI you can check the latency with the debug-latency feature. E.g.:

cargo build --config .cargo/release.toml --release --features debug-latency

L1quidH2O · 2025-05-30T12:27:20Z

Wow the SIMD runs just as fast! that's seriously impressive work
Repurposing could work to keep it simple, i don't think the performance wins of other approaches would justify their added complexity and potential binary size

- Added binary search

lhecker · 2025-06-04T20:37:00Z

Since we now have the SIMD performance improvements (they're now another 5x faster than what you tested!), I've updated your PR to not compile the line cache. We will still have great use for it in the future, however, for syntax highlighting. Thank you for your work!

This code was originally written for speeding up line searches but was disabled since we've since optimized line seeking with SIMD. We'll still have use for this code in the future, however, to cache syntax highlighter state every N lines.

Optimization - Added LineCache

0994f99

L1quidH2O added 2 commits May 29, 2025 16:48

Fixed mistake

cbf6c50

optimization with memchr2

ab1455d

lhecker reviewed May 29, 2025

View reviewed changes

L1quidH2O and others added 2 commits May 30, 2025 15:27

- Some fixes

6b8f6c2

- Added binary search

Merge remote-tracking branch 'origin/main' into lines-opt

86051a5

DHowett approved these changes Jun 4, 2025

View reviewed changes

lhecker changed the title ~~Add LineCache to optimize newlines lookup for faster scrolling~~ Add LineCache to optimize newlines lookup Jun 4, 2025

lhecker merged commit 239cde5 into microsoft:main Jun 4, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `LineCache` to optimize newlines lookup#352

Add `LineCache` to optimize newlines lookup#352
lhecker merged 5 commits intomicrosoft:mainfrom
L1quidH2O:lines-opt

L1quidH2O commented May 29, 2025

Uh oh!

L1quidH2O commented May 29, 2025

Uh oh!

lhecker commented May 29, 2025

Uh oh!

lhecker May 29, 2025

Uh oh!

lhecker May 29, 2025

Uh oh!

lhecker May 29, 2025

Uh oh!

lhecker May 29, 2025

Uh oh!

lhecker commented May 29, 2025 •

edited

Loading

Uh oh!

lhecker commented May 29, 2025 •

edited

Loading

Uh oh!

L1quidH2O commented May 30, 2025

Uh oh!

lhecker commented Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

L1quidH2O commented May 29, 2025

Slow Demo:

Fast Demo:

Uh oh!

L1quidH2O commented May 29, 2025

Uh oh!

lhecker commented May 29, 2025

Uh oh!

lhecker May 29, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 29, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 29, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker May 29, 2025

Choose a reason for hiding this comment

Uh oh!

lhecker commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhecker commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

L1quidH2O commented May 30, 2025

Uh oh!

lhecker commented Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lhecker commented May 29, 2025 •

edited

Loading

lhecker commented May 29, 2025 •

edited

Loading