perf: add direct-mapped node cache to BTreeMap by sasa-tomic · Pull Request #416 · dfinity/stable-structures

sasa-tomic · 2026-03-18T17:46:28Z

Summary

Add a 32-slot direct-mapped node cache to BTreeMap, modeled after CPU caches: O(1) lookup via (address / page_size) % 32, collision = eviction (no LRU tracking)
Read paths (get, contains_key, first/last_key_value) use a take+return pattern to avoid re-loading hot upper-tree nodes from stable memory
Write paths invalidate affected cache slots in save_node, deallocate_node, merge, and clear_new
Switch get() from destructive extract_entry_at (swap_remove) to non-destructive node.value() (borrows via OnceCell)
Remove now-unused extract_entry_at method

This subsumes all four previous caching approaches (root-only, LRU+clone, LRU+Rc, page cache) into a single design that:

Has ~5 instructions overhead per cache lookup (vs ~330 for the Rc LRU's linear scan)
Stores Node<K> directly (no Rc, no Clone, no heap allocation per cache entry)
Uses cache.get_mut() on write paths (zero RefCell overhead)

Expected improvement: ~15-20% for random reads, ~65% for hot-key workloads, ~0% overhead for writes.

Add a 32-slot direct-mapped node cache to BTreeMap that avoids re-loading hot nodes from stable memory. Modeled after CPU caches: O(1) lookup via (address / page_size) % 32, collision = eviction. Read paths (get, contains_key, first/last_key_value) use a take+return pattern to borrow nodes from the cache without RefCell lifetime issues. Write paths (insert, remove, split, merge) invalidate affected cache slots. Key changes: - Switch get() from destructive extract_entry_at to node.value() - Remove unused extract_entry_at method - Change traverse() closure from Fn(&mut Node) to Fn(&Node) - Invalidate cache in save_node, deallocate_node, merge, clear_new Expected improvement: ~15-20% for random reads, ~65% for hot-key workloads, ~0% overhead for writes (cache.get_mut() bypasses RefCell).

github-actions · 2026-03-19T15:02:47Z

`canbench` 🏋 (dir: ./benchmarks/btreeset) `a910cb9` 2026-03-31 14:48:48 UTC

✅ ./benchmarks/btreeset/canbench_results.yml is up to date
📦 canbench_results_btreeset.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max +1.06M | p75 0 | median 0 | p25 0 | min 0]
    change %: [max +0.20% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-19T15:02:48Z

`canbench` 🏋 (dir: ./benchmarks/nns) `a910cb9` 2026-03-31 14:48:49 UTC

✅ ./benchmarks/nns/canbench_results.yml is up to date
📦 canbench_results_nns.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 16 | regressed 0 | improved 1 | new 0 | unchanged 15]
    change:   [max +35.76M | p75 +289 | median -1.50K | p25 -413.07K | min -82.85M]
    change %: [max +0.40% | p75 0.00% | median -0.05% | p25 -0.25% | min -2.75%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                              | calls |   ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|-----------------------------------|-------|-------|---------|----|--------|-----|---------|
|   -    | vote_cascading_stable_chain_10k_5 |       | 2.92B |  -2.75% |  5 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-19T15:02:49Z

`canbench` 🏋 (dir: ./benchmarks/vec) `a910cb9` 2026-03-31 14:48:35 UTC

✅ ./benchmarks/vec/canbench_results.yml is up to date
📦 canbench_results_vec.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-19T15:02:49Z

`canbench` 🏋 (dir: ./benchmarks/memory_manager) `a910cb9` 2026-03-31 14:48:32 UTC

✅ ./benchmarks/memory_manager/canbench_results.yml is up to date
📦 canbench_results_memory-manager.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-20T11:10:50Z

`canbench` 🏋 (dir: ./benchmarks/io_chunks) `a910cb9` 2026-03-31 14:49:22 UTC

✅ ./benchmarks/io_chunks/canbench_results.yml is up to date
📦 canbench_results_io_chunks.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 18 | regressed 0 | improved 2 | new 0 | unchanged 16]
    change:   [max +87.27M | p75 0 | median 0 | p25 -176 | min -23.13B]
    change %: [max +0.73% | p75 0.00% | median 0.00% | p25 -0.00% | min -59.18%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                    | calls |     ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|-------------------------|-------|---------|---------|----|--------|-----|---------|
|   -    | read_chunks_btreemap_1m |       |  17.81B | -56.50% |  0 |  0.00% |   0 |   0.00% |
|   -    | read_chunks_btreemap_1k |       | 203.38M | -59.18% |  0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-20T11:25:02Z

`canbench` 🏋 (dir: ./benchmarks/btreemap) `a910cb9` 2026-03-31 14:49:55 UTC

✅ ./benchmarks/btreemap/canbench_results.yml is up to date
📦 canbench_results_btreemap.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 229 | regressed 0 | improved 100 | new 0 | unchanged 129]
    change:   [max +62.94M | p75 0 | median -1.62M | p25 -92.98M | min -1.48B]
    change %: [max +1.17% | p75 0.00% | median -0.21% | p25 -25.74% | min -91.42%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 229 | regressed 0 | improved 0 | new 0 | unchanged 229]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 229 | regressed 0 | improved 0 | new 0 | unchanged 229]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                                      | calls |     ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|-------------------------------------------|-------|---------|---------|----|--------|-----|---------|
|   -    | btreemap_v2_pop_first_vec_32_128          |       |   1.00B |  -6.23% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_vec_32_vec128       |       |   1.00B |  -6.23% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_blob_256_128        |       |   2.56B |  -6.31% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_vec_32_128           |       | 978.64M |  -6.40% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_vec_32_vec128        |       | 978.64M |  -6.40% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_blob_256_128         |       |   2.47B |  -6.47% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_blob_32_1024         |       | 976.88M |  -7.08% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_blob_32_1024        |       |   1.01B |  -7.10% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_blob_32_128         |       | 763.85M | -10.21% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_blob_32_128          |       | 732.59M | -10.27% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_u64_u64              |       | 585.12M | -10.59% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_blob_32_0           |       | 673.85M | -10.76% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_u64_u64             |       | 604.78M | -10.84% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_blob_8_128          |       | 530.70M | -11.22% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_blob_32_0            |       | 644.48M | -11.35% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_principal            |       | 691.07M | -11.88% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_last_blob_8_128           |       | 520.66M | -11.99% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_pop_first_principal           |       | 705.52M | -12.27% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_contains_blob_32_0            |       | 286.88M | -12.95% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_contains_blob_32_1024         |       | 280.85M | -13.87% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_u64_blob8                 |       | 193.98M | -14.34% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_0                 |       | 288.59M | -14.48% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_100k_u64_u64              |       |   2.32B | -16.10% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_1024              |       | 292.71M | -16.49% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_vec_128_128               |       | 479.29M | -16.79% |  0 |  0.00% |   0 |   0.00% |
|  ...   | ... 50 rows omitted ...                   |       |         |         |    |        |     |         |
|   -    | btreemap_v2_get_blob_1024_128             |       |   2.97B | -33.30% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_256_128              |       | 903.19M | -33.93% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_4_128                |       | 168.23M | -34.00% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_zipf_10k_u64_u64          |       | 140.00M | -37.01% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_blob_32_1024  |       | 255.02M | -42.74% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_blob_256_128  |       | 352.58M | -46.79% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_vec_4_128                 |       | 208.49M | -49.16% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_blob_32_128    |       | 122.35M | -54.16% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_blob_32_1024   |       | 168.95M | -56.65% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_zipf_heavy_10k_u64_u64    |       |  87.48M | -59.26% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_blob_32_128   |       | 106.21M | -61.42% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_u64_u64       |       |  84.89M | -63.27% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_blob_32_0     |       | 100.51M | -65.11% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_vec_32_128     |       |  73.27M | -66.86% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_vec_32_vec128  |       |  73.27M | -66.86% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_u64_u64        |       |  81.20M | -66.90% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_blob_256_128   |       | 150.71M | -76.51% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_blob_8_128    |       |  55.70M | -80.92% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_vec_32_128    |       |  38.66M | -82.98% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_vec_32_vec128 |       |  38.66M | -82.98% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_blob_8_128     |       |  52.86M | -83.54% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_blob_32_0      |       |  32.54M | -85.27% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_last_key_value_principal      |       |  32.06M | -85.28% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_contains_10mib_values         |       |  18.48M | -87.00% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_first_key_value_principal     |       |  31.52M | -91.42% |  0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

src/btreemap.rs

…expose hit/miss stats

src/btreemap.rs

alin-at-dfinity · 2026-03-26T07:56:21Z

src/btreemap.rs

+        if !self.is_enabled() {
+            self.misses += 1;
+            return None;
+        }


Just an idea, feel free to ignore: rather than disable the cache (and incur the cost of a branch), might it be preferable to have a cache of size 1?

(As a side note, I would actually be curious to see the difference in benchmark scores from adding this and the stats. Maybe it's all a storm in a teacup. But maybe not.)

If we have a single entry that will strictly be more computationally expensive (always and is unlikely to give any benefit since we'll keep overwriting it all the time.
size-0 execution path will be correctly predicted every time and costs ~0 cycles in practice.
A size-1 cache, on the other hand, would still allocate a Node on the heap and would collide on every operation (every node maps to slot 0), producing worse miss behaviour than even a small real cache.

So I'd recommend we go with size-0 by default, in the next version. And then we can turn it on to size-32 or larger by default once we get some feedback from production runs.

btw I now see in the benchmarks with default 0-size cache size some increase in instruction cost, up to +8%.
would it be possible to have some kind of short circuit not to do extra work when cache size is zero?

btw I now see in the benchmarks with default 0-size cache size some increase in instruction cost, up to +8%.

This is what I was referring to. Sure, a size 1 cache is going to be more expensive than a size 0 cache. But having the option of a size 0 cache makes all other cache sizes more expensive. And the size 0 cache more expensive than not having this feature at all.

So I guess it's a trade-off between optimizing for no caching vs. optimizing for caching. I was going for the latter, since caching appears to make a positive change in performance, but we could also stick with no caching as the default.

By inlining one function, we're now roughly in the negative range (marginally improved performance) even with cache disabled.

re: size 1 cache

In current implementation direct-mapped cache does not prioritise top level nodes over lower level nodes, meaning that the cache with size 1 will always be overwritten on each tree traversal and will always have 0 hits, which is a pure waste of cycles.

More over any cache with the size smaller than the tree height will have 0 hits. So the rough range for the cache sizes that make it practically usable:

lower bound must be at least the tree size, maybe x2 to account for collisions

upper bound any cache size that reaches 90-95% hit ratio, above that the ROI is too small

(optional exploration idea) Based on that, I suppose one interesting idea to explore (not now, in the future PRs) is to store in cache also the current node height (or level) and prioritize top nodes (closer to root) over lower level nodes. Or maybe when we decide to use set-assotiative cache store cache lines of different levels with prioritization by level.

added storing and checking depth to node cache, it showed some good performance boost:

Only significant changes: | status | name | calls | ins | ins Δ% | HI | HI Δ% | SMI | SMI Δ% | |--------|-------------------------------------------------|-------|---------|---------|----|--------|-----|---------| | - | btreemap_v2_get_vec_32_128_cached_32entry | | 257.72M | -13.20% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_contains_vec_32_128_cached_32entry | | 251.19M | -13.50% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_get_u64_u64_cached_32entry | | 140.77M | -15.96% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_contains_u64_u64_cached_32entry | | 136.11M | -16.42% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_get_blob_32_128_cached_32entry | | 206.55M | -17.49% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_contains_blob_32_128_cached_32entry | | 201.95M | -17.79% | 0 | 0.00% | 0 | 0.00% | ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

Agreed — with depth-aware eviction the default now makes more sense as non-zero. Changed default to 16 slots, which covers the top 2 tree levels. The NodeCache::take/put/invalidate methods now early-return when disabled (0 slots), so there's no branch cost in the btreemap code itself.

alin-at-dfinity · 2026-03-26T08:03:07Z

src/btreemap.rs

+    pub fn cache_stats(&self) -> CacheStats {
+        self.cache.borrow().stats()
+    }


Might be useful to return the cache size in bytes. The end user might have a hard time figuring out the average node size or even just the page size.

there were some requests from users to add to stable structures methods that report memory usage: heap usage, stable memory allocated and actually used. if this is implemented it also covers any cache, so maybe it should not report back the cache size. also if user configures the cache size, it's expected to be full.

I'm fine if this PR does not implement reporting cache size.

Reporting the cache size in bytes is easy and cheap, so I just added it.

it wasn't so easy after all, I'll look for another way to do it.

I added a MemSize trait implementation to this branch.
It calculates heap usage with (in my opinion) an acceptable tradeoff between the cost and precision.
Please take a look and let me know what you think. It's ok to revert my changes if they don't work.

I implemented an accurate memory calculation for heap usage and compared against the approximate one: if value sizes were small, then the difference in numbers were x2-x4, but if the values were big (eg. key is u32 and value is blob of 900 bytes) then it went x40 and above.

I also removed the whole mem_size implementation to see how much extra instructions it will provide and it was between +12% and +24% which is pretty significant.

Only significant changes: | status | name | calls | ins | ins Δ% | HI | HI Δ% | SMI | SMI Δ% | |--------|-------------------------------------------------|-------|---------|---------|----|--------|-----|---------| | - | btreemap_v2_get_vec_32_128_cached_32entry | | 296.90M | -12.42% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_contains_vec_32_128_cached_32entry | | 290.40M | -12.67% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_get_blob_32_128_cached_32entry | | 250.32M | -13.54% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_contains_blob_32_128_cached_32entry | | 245.66M | -13.76% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_get_u64_u64_cached_32entry | | 167.51M | -24.02% | 0 | 0.00% | 0 | 0.00% | | - | btreemap_v2_contains_u64_u64_cached_32entry | | 162.86M | -24.54% | 0 | 0.00% | 0 | 0.00% | ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

now I changed my mind, and even though approximate value is not exactly accurate, it is still a possible upper bound user should take into consideration (because in some cases values can get cached).

I am fine with going with less accurate but with much simpler and faster solution. Sorry it took some time on debating & coding this topic. we can always add more accurate calculation later if we find a better way to do it, no need to get blocked on it now.

Just a question for my own illumination:

I also removed the whole mem_size implementation to see how much extra instructions it will provide and it was between +12% and +24% which is pretty significant.

Is this on common workloads (i.e. do we call mem_size() frequently) or is this only the performance of computing the exact usage vs a (potentially quite bad) approximation?

it's mostly because computing the exact usage turned out to be expensive, especially for keys, which are used on btreemap level in their original type, so the only way to get the size of the key is to convert it to bytes via to_bytes().len().

Acknowledged. node_cache_size_bytes_approx is still available as a rough estimate. The exact-size MemSize approach was too expensive as you found (+12-24% instructions), so the approximate method stays as an order-of-magnitude guide.

src/btreemap.rs

maksymar · 2026-03-30T12:39:33Z

FYI: I'll resolve merge conflicts.

sasa-tomic

Addressing open review comments with code changes and replies.

…ard NULL - Change DEFAULT_NODE_CACHE_NUM_SLOTS from 0 to 16 (covers top 2 tree levels) - Remove redundant cache_num_slots field from BTreeMap; NodeCache methods now self-guard via is_enabled() and callers no longer need checks - Add NULL address guard in NodeCache::take to prevent false cache hits

maksymar

Great results! 🎉 Thank you!

maksymar · 2026-03-31T07:54:29Z

src/btreemap/node_cache.rs

+        // compete on pure recency: a non-root node can evict any other
+        // non-root node. This lets hot leaves and working-set interior
+        // nodes stay cached while stale upper-level nodes get displaced.
+        if slot.node.is_none() || depth == 0 || slot.depth > 0 {


FYI: I was thinking about this idea of always protecting nodes with lower depth, and realized it may not work very well for caches with bigger sizes, because it will always populate the top of the tree ignoring hot paths.

so instead I think it would be reasonable to protect only a few top levels (0-1) and let nodes on other levels compete freely. it should also work with the default size of 16.

maksymar · 2026-03-31T11:14:02Z

[DONE]
I would suggest to wait with merging this PR until I merge another PR that improves benchmarks.
Currently our benchmarks do not cover potential cache cases, so I would like to remove some excessive benchmarks and add cache-specific use cases, so that we can also see the difference on that PR.

maksymar · 2026-03-31T14:00:39Z

The chart shows how increasing node cache size reduces instruction count (relative to no cache) across operation categories.

first/last & pop plateau almost immediately because they only need the tree height number of nodes in the cache at most
get and contains keep improving steadily with the size of the cache
the rest categories are not currently supported by caching, but probably will be in the future

From this data I'm happy to see that "the default 16" is already good enough while being reasonably small, at the same time bigger caches will provide more value.

sasa-tomic requested a review from a team as a code owner March 18, 2026 17:46

sasa-tomic marked this pull request as draft March 18, 2026 17:52

Merge branch 'main' into perf/direct-mapped-node-cache

06a4ef3

sasa-tomic added 2 commits March 20, 2026 11:42

Merge branch 'main' into perf/direct-mapped-node-cache

3ba9552

Update canbench results and add uncached reads

c3a1072

Update io-chunks perf

1ab43e9

sasa-tomic marked this pull request as ready for review March 20, 2026 11:23

maksymar reviewed Mar 25, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

sasa-tomic added 2 commits March 25, 2026 14:38

Make NodeCache size configurable at construction size, and track and …

cc81894

…expose hit/miss stats

fmt

ec14e6d

maksymar reviewed Mar 25, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

alin-at-dfinity reviewed Mar 26, 2026

View reviewed changes

update results and reviews

53c2c37

maksymar reviewed Mar 26, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

maksymar reviewed Mar 26, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

Address reviews

a4f64b4

maksymar reviewed Mar 27, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

maksymar reviewed Mar 27, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

sasa-tomic and others added 5 commits March 27, 2026 14:25

optimize disabled cache case

3313af0

rename and extend cache metrics

1100a19

Added an approximate node cache size in bytes

9516ea9

Add cache tests and fix clippy

a5ac1fc

add mem_size

d0dffc4

maksymar reviewed Mar 30, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

maksymar reviewed Mar 30, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

maksymar added 2 commits March 30, 2026 13:37

update benchmarks/nns canbench

37115d7

update get_min_or_max to main

3ce0439

maksymar added 7 commits March 30, 2026 14:42

merge main

9d28e74

merge main

89854a7

find_first_or_last

08a3a09

iterative is faster

e574e54

add benchmarks for first/last_key_value

04b0469

remove extra clonning

ab38fb1

distinguish cold vs collision misses

766eaf0

sasa-tomic commented Mar 30, 2026

View reviewed changes

sasa-tomic added 4 commits March 30, 2026 16:29

fmt

4eeb6e4

update results

c0a31e6

Update more results

36ea72d

maksymar approved these changes Mar 30, 2026

View reviewed changes

depth-0 protected cache

2ef04b4

maksymar reviewed Mar 31, 2026

View reviewed changes

maksymar added 3 commits March 31, 2026 10:00

depths 0 and 1 protected

9b27f59

fix eviction condition, add cache tests

02690e0

rm cached_32entry benchmarks

e3228fc

maksymar added 3 commits March 31, 2026 14:42

merge main

f046e3c

merge main

1b528f3

canbench persist

da14f20

maksymar added 2 commits March 31, 2026 16:37

merge main

6aa6a04

canbench --persist

6d92303

Conversation

sasa-tomic commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/btreeset) a910cb9 2026-03-31 14:48:48 UTC

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/nns) a910cb9 2026-03-31 14:48:49 UTC

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/vec) a910cb9 2026-03-31 14:48:35 UTC

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/memory_manager) a910cb9 2026-03-31 14:48:32 UTC

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/io_chunks) a910cb9 2026-03-31 14:49:22 UTC

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/btreemap) a910cb9 2026-03-31 14:49:55 UTC

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maksymar Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maksymar commented Mar 30, 2026

Uh oh!

sasa-tomic left a comment

Choose a reason for hiding this comment

Uh oh!

maksymar left a comment

sasa-tomic commented Mar 18, 2026 •

edited

Loading

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/btreeset) `a910cb9` 2026-03-31 14:48:48 UTC

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/nns) `a910cb9` 2026-03-31 14:48:49 UTC

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/vec) `a910cb9` 2026-03-31 14:48:35 UTC

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/memory_manager) `a910cb9` 2026-03-31 14:48:32 UTC

github-actions bot commented Mar 20, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/io_chunks) `a910cb9` 2026-03-31 14:49:22 UTC

github-actions bot commented Mar 20, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/btreemap) `a910cb9` 2026-03-31 14:49:55 UTC

maksymar Mar 29, 2026 •

edited

Loading

maksymar Mar 31, 2026 •

edited

Loading

maksymar commented Mar 31, 2026 •

edited

Loading