fix: add cache to speed up `fn tipset_by_height` by hanabi1224 · Pull Request #5944 · ChainSafe/forest

hanabi1224 · 2025-08-18T02:06:15Z

Summary of changes

This is to facilitate @ADobrodey to generate historical lite snapshots with bug fixes and performance improvements

Changes introduced in this pull request:

add cache to speed up fn tipset_by_height
(Forest.StateCompute) load tipset messages from network if absent in datastore

Reference issue to close (if applicable)

Closes

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

New Features
- One-step persistence for complete tipsets; public APIs to load/fetch full tipsets and exchange chain messages.
- RPC state computations now auto-load or fetch missing full tipsets when needed.
Refactor
- Checkpoint-based caching for tipset-by-height to speed lookups.
- Reduced data copying and more borrowing in sync paths for efficiency.
- Message-root lookup now uses richer error context.
Chores
- Renamed a metrics label for clearer observability.

coderabbitai · 2025-08-18T02:06:21Z

Walkthrough

Adds FullTipset.persist to validate and store tipsets, changes read_msg_cids to accept a block header, introduces a CHAIN_FINALITY-based LRU checkpoint cache for tipset_by_height, refactors chain follower to borrow TipsetKey and expose loaders, extends network context for full-tipset/messages exchange, and generalizes chain-exchange conversions.

Changes

Cohort / File(s)	Summary of modifications
Tipset persistence & validation `src/blocks/tipset.rs`, `src/chain_sync/validation.rs`	Added `pub fn persist(&self, db: &impl Blockstore) -> anyhow::Result<()>` on `FullTipset` that validates per-block message-root metadata via `TipsetValidator::validate_msg_root` and persists blocks. `validate_msg_root` changed to an associated function (removed `&self`) and call sites updated.
Message CID reading API `src/chain/store/chain_store.rs`, `src/libp2p/chain_exchange/provider.rs`, `src/rpc/methods/chain.rs`	`read_msg_cids` signature changed to accept `&CachingBlockHeader` instead of a messages CID/ref; callers updated to pass the block header. `read_msg_cids` derives `msg_cid` internally and improves error text with epoch and block CID.
Chain index checkpoint cache `src/chain/store/index.rs`	Added CHAIN_FINALITY-based checkpoint caching: static `CACHE: LazyLock<SizeTrackingLruCache<ChainEpoch, TipsetKey>>`, `next_checkpoint` helper using `mod_floor`, seeding `from` from cached tipsets via `load_tipset`, and caching finalized child tipsets during backward traversal. Changed `from` to `mut`.
Chain follower / API & ownership refactor `src/chain_sync/chain_follower.rs`, `src/chain_sync/mod.rs`	Call sites and functions now take `&TipsetKey` (borrowed). `get_full_tipset` and `load_full_tipset` made public and adjusted to load/persist `FullTipset` via `FullTipset::persist`. Re-exported `load_full_tipset` from `chain_sync`.
Network context: full-tipset/messages fetch `src/chain_sync/network_context.rs`, `src/rpc/methods/f3.rs`, `src/rpc/methods/state.rs`	Added `chain_exchange_messages(&self, peer_id, ts)` to request local-tipset messages and renamed/exposed `chain_exchange_fts` → `chain_exchange_full_tipset(&self, peer_id, tsk)`. RPCs updated to use new calls; state RPC ensures full tipset is loaded (store or network) and persists it. Minor metrics label rename in F3.
Chain-exchange conversion generalization `src/libp2p/chain_exchange/message.rs`	`ChainExchangeResponse::into_result` generalized to accept any `TryFrom<TipsetBundle>` where the associated `Error: Display`; conversion errors mapped to strings with `to_string()`.
Provider & RPC callers updated `src/libp2p/chain_exchange/provider.rs`, `src/rpc/methods/chain.rs`	Call sites updated to pass `block_header` to `read_msg_cids` instead of `&block_header.messages`; surrounding logic unchanged.
Shim policy module `src/shim/mod.rs`, `src/shim/policy.rs`	Added `pub mod policy;` and new `src/shim/policy.rs` that re-exports `policy_constants` from `fil_actors_shared::v16::runtime`.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant ChainFollower
  participant ChainStore
  participant Network
  participant Blockstore

  Caller->>ChainFollower: get_full_tipset(&TipsetKey)
  ChainFollower->>ChainStore: load_full_tipset(&TipsetKey)
  alt present in store
    ChainStore-->>ChainFollower: FullTipset
  else missing
    ChainFollower->>Network: chain_exchange_full_tipset(peer, &TipsetKey)
    Network-->>ChainFollower: FullTipset
    ChainFollower->>Blockstore: FullTipset.persist(db)
  end
  ChainFollower-->>Caller: FullTipset

sequenceDiagram
  participant RPC
  participant ChainStore
  participant Network
  participant Blockstore
  participant VM

  RPC->>ChainStore: try load_full_tipset(ts.key())
  alt present
    ChainStore-->>RPC: FullTipset
  else missing
    RPC->>Network: chain_exchange_messages(None, &ts)
    Network-->>RPC: FullTipset
    RPC->>Blockstore: FullTipset.persist(db)
  end
  RPC->>VM: compute_tipset_state(ts)
  VM-->>RPC: Result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: more lru cache metrics #5848 — related: introduces/changes SizeTrackingLruCache accessors and behavior that the new tipset_by_height CACHE relies on.

Suggested labels

RPC

Suggested reviewers

akaladarshi
LesnyRumcajs
elmattic

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/tipset_by_height_cache

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

src/chain/store/index.rs (2)
153-156: Consider caching both child and parent on boundaries

You only cache child at boundaries. Caching parent when it lands on a boundary can improve hit rates for lookups that resolve to older epochs in the same call.

Optional tweak:
 for (child, parent) in self.chain(from).tuple_windows() {
     if child.epoch() % CHECKPOINT_INTERVAL == 0 {
         CACHE.push(child.epoch(), child.clone());
     }
+    if parent.epoch() % CHECKPOINT_INTERVAL == 0 {
+        CACHE.push(parent.epoch(), parent.clone());
+    }
     ...
 }
208-308: Add tests that exercise checkpoint cache across forks

Current tests don’t hit the checkpoint path (epochs are < 1000) and don’t cover fork-selection correctness with cached checkpoints. Please add:

A scenario that populates the cache at a checkpoint epoch on branch A, then calls tipset_by_height on branch B that passes through the same checkpoint epoch, asserting the result remains on branch B.

A scenario verifying we never select a checkpoint > from.epoch().

I can draft these tests if helpful.
src/chain_sync/validation.rs (1)
72-75: Nit: fix typo in the comment (“mst root” → “msg root”)

Minor typo in the validation comment.
-        // 1. Calculating the message root using all of the messages to ensure it
-        // matches the mst root in the block header 2. Ensuring it has not
+        // 1. Calculating the message root using all of the messages to ensure it
+        // matches the msg root in the block header 2. Ensuring it has not
src/blocks/tipset.rs (1)

550-558: Persist full tipset with pre-persist msg-root validation — clarify side effects

The implementation validates msg roots before persisting each block, which is a good safety check. One concern: TipsetValidator::validate_msg_root likely computes and persists TxMeta/AMTs as a side effect. If subsequent block.persist fails, the store may end up with TxMeta referencing messages not yet present. This is acceptable if failures are practically impossible or writes are idempotent, but the side effect is non-obvious from the name.

Consider documenting validate_msg_root’s persistence side effects here, or switching to a side-effect-free check (if available), followed by an explicit persist of TxMeta.

Alternatively, rename validate_msg_root to something that reflects its effect (e.g., ensure_msg_root_persisted) to make intent clear at call sites.

Would you like me to draft doc comments or a small refactor to make this explicit?
src/rpc/methods/f3.rs (1)
584-601: Persist after full validation and use FullTipset::persist to centralize logic

Currently, blocks are persisted before full tipset validation, and the per-block loop duplicates logic now encapsulated by FullTipset::persist (which also validates msg roots). To reduce the risk of persisting invalid data and to keep logic centralized, validate first, then persist the whole tipset via FullTipset::persist.

Proposed change:

Run validator.validate first (already present).

Replace the per-block loop with fts.persist(ctx.store())? after successful validation.

Apply this diff:
-                let fts = ctx
+                let fts = ctx
                     .sync_network_context
                     .chain_exchange_full_tipset(None, &tsk)
                     .await?;
-                for block in fts.blocks() {
-                    block.persist(ctx.store())?;
-                }
                 let validator = TipsetValidator(&fts);
                 validator.validate(
                     ctx.chain_store(),
                     None,
                     &ctx.chain_store().genesis_tipset(),
                     ctx.chain_config().block_delay_secs,
                 )?;
+                // Persist the tipset after successful validation.
+                fts.persist(ctx.store())?;
                 let ts = Arc::new(Tipset::from(fts));
                 ctx.chain_store().put_tipset(&ts)?;
                 ctx.chain_store().set_heaviest_tipset(finalized_ts)?;
src/rpc/methods/state.rs (1)
1435-1449: Solid fallback: pre-loading FullTipset with network fetch when absent in store

Good addition. Ensures state computation has messages available by first attempting a local load and falling back to ChainExchange if needed. Persisting the fetched FullTipset closes the loop.

Two small improvements would help operability and diagnostics (optional):

Add a debug trace when falling back to the network to aid troubleshooting.

Add context to persist errors for quicker issue triage.
         // Attempt to load full tipset from the store
-        if crate::chain_sync::load_full_tipset(ctx.chain_store(), ts.key()).is_err() {
+        if crate::chain_sync::load_full_tipset(ctx.chain_store(), ts.key()).is_err() {
+            tracing::debug!(epoch = %ts.epoch(), tsk = %ts.key(), "FullTipset not in store; fetching via ChainExchangeMessages");
             // Load full tipset from the network
             let fts = ctx
                 .sync_network_context
                 .chain_exchange_messages(None, &ts)
                 .await
                 .map_err(|e| anyhow::anyhow!(e))?;
-            fts.persist(ctx.store())?;
+            fts.persist(ctx.store()).context("persisting fetched FullTipset into local store")?;
         }
src/chain_sync/chain_follower.rs (2)
434-452: Consistent, robust FullTipset retrieval and persistence

Early return on successful local load is optimal.

Network fallback to chain_exchange_full_tipset and persisting the entire tipset is the right behavior.

Minor nit: prefer adding error context for the network call and persist path to improve debuggability.
-    let tipset = network
-        .chain_exchange_full_tipset(peer_id, tipset_keys)
-        .await
-        .map_err(|e| anyhow::anyhow!(e))?;
-    tipset.persist(&chain_store.db)?;
+    let tipset = network
+        .chain_exchange_full_tipset(peer_id, tipset_keys)
+        .await
+        .map_err(|e| anyhow::anyhow!(e))?;
+    tipset
+        .persist(&chain_store.db)
+        .context("persisting FullTipset fetched via ChainExchange")?;
458-477: Use FullTipset::persist instead of per-block persist in batch path

The single-tipset path persists the entire tipset atomically. Here, the batch path persists blocks individually, which is slower and risks subtle inconsistencies if future persist behavior diverges between block and tipset levels. For consistency and maintainability, persist per tipset.
     let tipsets = network
         .chain_exchange_full_tipsets(peer_id, tipset_keys)
         .await
         .map_err(|e| anyhow::anyhow!(e))?;
 
-    for tipset in tipsets.iter() {
-        for block in tipset.blocks() {
-            block.persist(&chain_store.db)?;
-        }
-    }
+    for tipset in &tipsets {
+        tipset.persist(&chain_store.db)?;
+    }
 
     Ok(tipsets)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 17676f6 and 518014f.

📒 Files selected for processing (12)

src/blocks/tipset.rs (2 hunks)
src/chain/store/chain_store.rs (2 hunks)
src/chain/store/index.rs (4 hunks)
src/chain_sync/chain_follower.rs (7 hunks)
src/chain_sync/mod.rs (1 hunks)
src/chain_sync/network_context.rs (3 hunks)
src/chain_sync/validation.rs (1 hunks)
src/libp2p/chain_exchange/message.rs (1 hunks)
src/libp2p/chain_exchange/provider.rs (1 hunks)
src/rpc/methods/chain.rs (1 hunks)
src/rpc/methods/f3.rs (2 hunks)
src/rpc/methods/state.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

🧬 Code Graph Analysis (11)

src/chain/store/chain_store.rs (4)

src/message/chain_message.rs (1)

cid (29-34)

src/blocks/header.rs (2)

cid (69-71)

cid (304-306)

src/shim/message.rs (1)

cid (195-198)

src/blocks/block.rs (1)

cid (41-43)

src/rpc/methods/state.rs (1)

src/chain_sync/chain_follower.rs (1)

load_full_tipset (479-501)

src/blocks/tipset.rs (3)

src/blocks/block.rs (1)

persist (46-50)

src/chain/store/chain_store.rs (1)

db (502-502)

src/chain_sync/validation.rs (1)

validate_msg_root (106-116)

src/libp2p/chain_exchange/message.rs (2)

src/shim/sector.rs (1)

try_from (243-245)

src/shim/crypto.rs (2)

try_from (205-217)

try_from (325-332)

src/chain_sync/mod.rs (1)

src/chain_sync/chain_follower.rs (2)

chain_follower (134-331)

load_full_tipset (479-501)

src/rpc/methods/f3.rs (1)

src/blocks/tipset.rs (1)

tsk (265-269)

src/libp2p/chain_exchange/provider.rs (1)

src/chain/store/chain_store.rs (2)

read_msg_cids (494-513)

db (502-502)

src/chain/store/index.rs (3)

src/beacon/drand.rs (1)

new (251-269)

src/blocks/tipset.rs (11)

new (242-260)

new (494-511)

epoch (306-308)

epoch (543-545)

from (105-110)

from (160-162)

from (166-168)

from (172-177)

from (181-186)

from (205-214)

from (479-484)

src/utils/cache/lru.rs (1)

new_with_default_metrics_registry (96-101)

src/rpc/methods/chain.rs (1)

src/chain/store/chain_store.rs (1)

read_msg_cids (494-513)

src/chain_sync/network_context.rs (2)

src/blocks/tipset.rs (2)

new (242-260)

new (494-511)

src/blocks/header.rs (1)

new (282-288)

src/chain_sync/chain_follower.rs (4)

src/blocks/tipset.rs (2)

key (336-339)

key (530-533)

src/rpc/mod.rs (1)

chain_store (466-468)

src/state_manager/mod.rs (1)

chain_store (296-298)

src/blocks/chain4u.rs (1)

tipset (183-185)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: cargo-publish-dry-run
GitHub Check: Build MacOS
GitHub Check: Build Ubuntu
GitHub Check: All lint checks
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: Analyze (rust)
GitHub Check: Analyze (go)

🔇 Additional comments (15)

src/libp2p/chain_exchange/message.rs (1)

128-130: Generic TryFrom bound + error mapping look good

Relaxing the bound to T: TryFrom<TipsetBundle, Error: Display> and mapping errors via to_string() is correct and backward-compatible with existing TryFrom<TipsetBundle, Error = String> impls. Control flow remains unchanged.

Also applies to: 137-141

src/chain/store/chain_store.rs (2)

470-476: Call-site update to pass block header is correct

Delegating to read_msg_cids(db, bh) aligns with the new signature and removes the need to plumb raw CID parameters around.

494-513: Improved diagnostics for missing TxMeta

Passing the full block header allows a much more actionable error: including the epoch and block CID is great for triage. Logic for loading AMTs remains unchanged.

src/chain_sync/validation.rs (1)

76-76: Static call to validate_msg_root is correct

Switching from self.validate_msg_root to Self::validate_msg_root matches the new static method signature and removes a needless receiver.

src/chain_sync/network_context.rs (1)

244-246: Broadened generic bounds align with chain_exchange/message changes

Updating handle_chain_exchange_request to accept T: TryFrom<TipsetBundle, Error: Display> syncs with ChainExchangeResponse::into_result::<T>(). Good cohesion across layers.

src/rpc/methods/chain.rs (1)

556-561: Correct API update: pass the whole block header to read_msg_cids

Switching to read_msg_cids(ctx.store(), &blk) aligns with the new signature and improves error context downstream. Looks good.

src/chain_sync/mod.rs (1)

16-21: Publicly re-exporting load_full_tipset is consistent with new call sites

Re-export looks correct and matches updated RPC/state call sites relying on borrowed TipsetKey. No issues.

src/libp2p/chain_exchange/provider.rs (1)

98-101: read_msg_cids call updated correctly

Passing block_header matches the revised API and enables improved error reporting (epoch + block cid). Change is correct.

src/rpc/methods/f3.rs (1)

166-171: Metric name change may break dashboards/alerts

Renaming the cache metrics key to "get_powertable" can affect existing dashboards/alerts expecting "get_powertable_cache". If this rename is intentional, ensure dashboards and alerts are updated accordingly.

Do you want a follow-up PR checklist for updating observability assets?

src/rpc/methods/state.rs (1)

1453-1453: LGTM: using the pre-loaded tipset for state computation

Reusing the locally bound ts after ensuring messages are available is correct and keeps the control flow cohesive.

src/chain_sync/chain_follower.rs (5)

179-180: LGTM: pass TipsetKey by reference

Borrowing &tipset_keys avoids an extra allocation/copy and aligns with the new API.

191-192: LGTM: consistent borrowing of TipsetKey

Using &key here keeps the call-site consistent with the updated signature and avoids unnecessary moves.

479-485: LGTM: public, borrowed-key load_full_tipset with typed message load

Exposing a borrowed &TipsetKey API and reconstructing a FullTipset via block_messages is clean and efficient.

Also applies to: 501-501

601-601: LGTM: parent readiness check with borrowed-key loader

Using load_full_tipset(&self.cs, tipset.parents()) for readiness keeps the new borrowing conventions consistent and avoids needless allocations.

875-876: LGTM: updated call site to pass borrowed key

The fetch task now correctly passes &key to get_full_tipset_batch.

src/chain_sync/network_context.rs

src/chain/store/index.rs

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/chain/store/index.rs (1)

125-147: Cache key is ambiguous across forks; guard adoption with a same-branch check

The global LRU is keyed only by ChainEpoch. Unconditionally replacing from with a cached tipset at that epoch can select a tipset from a different fork than the caller’s from. This breaks the contract that from disambiguates branches.

Action:

Only adopt a cached checkpoint if it’s an ancestor of the provided from (i.e., on the same branch). This preserves correctness without giving up the performance benefit when the checkpoint matches the branch.

Apply this diff within the shown hunk:

-        let mut checkpoint_from_epoch = to;
-        while checkpoint_from_epoch < from_epoch {
-            if let Some(checkpoint_from) = CACHE.get_cloned(&checkpoint_from_epoch) {
-                from = checkpoint_from;
-                break;
-            }
-            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
-        }
+        // Try to seed `from` from a checkpoint on the same branch.
+        let mut checkpoint_from_epoch = to;
+        while checkpoint_from_epoch < from_epoch {
+            if let Some(cp) = CACHE.get_cloned(&checkpoint_from_epoch) {
+                // Guard adoption: only if cached cp is an ancestor of current `from`.
+                let cp_epoch = cp.epoch();
+                let same_branch = self
+                    .chain(from.clone())
+                    .take_while(|ts| ts.epoch() >= cp_epoch)
+                    .any(|ts| ts.key() == cp.key());
+                if same_branch {
+                    from = cp;
+                    break;
+                }
+                // else: ignore and keep searching older checkpoints
+            }
+            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
+        }

If you prefer a helper, I can provide a small is_ancestor_on_same_branch(&self, child, ancestor) utility to encapsulate the check. Let me know.

🧹 Nitpick comments (3)

src/chain/store/index.rs (3)
125-133: Scope of static cache may cross DB/network boundaries

static CACHE: LazyLock<...> is shared across all ChainIndex instances and databases in-process. This can lead to:

Cross-network/DB contamination (e.g., testnet/mainnet, separate MemoryDBs in tests).

Attempting to walk parents of a cached tipset that do not exist in the current self.db.

Consider scoping this cache to ChainIndex (as a struct field) or including a partitioning dimension (e.g., network/chain ID) in the key to avoid cross-context pollution.

134-136: Minor: clarify checkpoint computation and boundary behavior

next_checkpoint always returns the next multiple strictly greater than epoch (even when already aligned). This is fine given the current loop starts at to and probes to first, but a quick comment here would prevent future misinterpretation.

Apply:
-        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
-            epoch - epoch.mod_floor(&CHECKPOINT_INTERVAL) + CHECKPOINT_INTERVAL
-        }
+        // Returns the next multiple of CHECKPOINT_INTERVAL strictly greater than `epoch`.
+        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
+            epoch - epoch.mod_floor(&CHECKPOINT_INTERVAL) + CHECKPOINT_INTERVAL
+        }
160-166: Use saturating subtraction to avoid potential overflow and reflect intent

The condition child.epoch() + CHECKPOINT_INTERVAL <= from_epoch can be expressed without addition, avoiding a potential overflow edge case and matching the “at least one interval behind” intent.

Apply:
-            if child.epoch() % CHECKPOINT_INTERVAL == 0
-                && child.epoch() + CHECKPOINT_INTERVAL <= from_epoch
+            if child.epoch() % CHECKPOINT_INTERVAL == 0
+                && child.epoch() <= from_epoch.saturating_sub(CHECKPOINT_INTERVAL)
             {
                 CACHE.push(child.epoch(), child.clone());
             }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1e0a4da and b49d29b.

📒 Files selected for processing (1)

src/chain/store/index.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.907Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

📚 Learning: 2025-08-18T03:09:47.907Z

Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.907Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Applied to files:

src/chain/store/index.rs

🧬 Code Graph Analysis (1)

src/chain/store/index.rs (2)

src/beacon/drand.rs (1)

new (251-269)

src/utils/cache/lru.rs (1)

new_with_default_metrics_registry (96-101)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: cargo-publish-dry-run
GitHub Check: All lint checks
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: Build MacOS
GitHub Check: Build Ubuntu
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: Analyze (rust)
GitHub Check: Analyze (go)

🔇 Additional comments (2)

src/chain/store/index.rs (2)

122-124: Allowing mut from is appropriate for cache seeding

Changing the parameter to mut is necessary to reseed the traversal starting point from a cached checkpoint. No concerns here.

4-4: std::sync::LazyLock is supported under MSRV 1.89.0
The project’s rust-toolchain.toml pins the channel to 1.89.0, which includes std::sync::LazyLock. No change is needed here.

elmattic · 2025-08-18T07:48:33Z

src/chain/store/index.rs

+        mut from: Arc<Tipset>,
        resolve: ResolveNullTipset,
    ) -> Result<Arc<Tipset>, Error> {
+        const CHECKPOINT_INTERVAL: ChainEpoch = 1000;


Could we use the chain finality constant here, or a value expressed in terms of chain finality?

elmattic · 2025-08-18T07:51:29Z

src/chain/store/index.rs

        resolve: ResolveNullTipset,
    ) -> Result<Arc<Tipset>, Error> {
+        const CHECKPOINT_INTERVAL: ChainEpoch = 1000;
+        static CACHE: LazyLock<SizeTrackingLruCache<ChainEpoch, Arc<Tipset>>> =


Why choose Tipset for the value instead of just TipsetKey? I expect a Tipset to have a much larger memory footprint.

Good point. Change to using TipsetKey

Will a chekpoint tipset value be cached in the ChainStore's TipsetCache? If not, is this possible or do we even want this?

Change to using self.load_tipset instead which is tipset_cache aware

elmattic · 2025-08-18T08:16:49Z

with bug fixes and performance improvements

It's pretty clear what the performance improvement is, but what was the bug that was fixed?

elmattic

Does this PR also resolve the performance issues we've been having with certain RPC endpoints, such as eth_getBlockByNumber, etc.?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/chain/store/index.rs (3)

127-133: LRU capacity: minor ergonomic tweak

Use the existing nonzero! macro (already imported) for clarity instead of try_into().expect("infallible").

 static CACHE: LazyLock<SizeTrackingLruCache<ChainEpoch, TipsetKey>> = LazyLock::new(|| {
     SizeTrackingLruCache::new_with_default_metrics_registry(
         "tipset_by_height".into(),
-        4096.try_into().expect("infallible"),
+        nonzero!(4096_usize),
     )
 });

17-17: Drop num::Integer by using rem_euclid

You can avoid the extra trait import and dependency by using the standard rem_euclid for non-negative modulo. Behavior matches mod_floor here.

- use num::Integer;
+ // use of rem_euclid below removes the need for num::Integer

-        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
-            epoch - epoch.mod_floor(&CHAIN_FINALITY) + CHAIN_FINALITY
-        }
+        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
+            epoch - epoch.rem_euclid(CHAIN_FINALITY) + CHAIN_FINALITY
+        }

Also applies to: 134-137

141-150: Leverage the tipset cache when seeding from the checkpoint cache

Use self.load_tipset to populate the primary tipset cache and track metrics instead of calling Tipset::load directly.

-        while checkpoint_from_epoch < from_epoch {
-            if let Some(checkpoint_from_key) = CACHE.get_cloned(&checkpoint_from_epoch)
-                && let Ok(Some(checkpoint_from)) = Tipset::load(&self.db, &checkpoint_from_key)
-            {
-                from = checkpoint_from.into();
-                break;
-            }
-            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
-        }
+        while checkpoint_from_epoch < from_epoch {
+            if let Some(checkpoint_from_key) = CACHE.get_cloned(&checkpoint_from_epoch) {
+                if let Some(checkpoint_from) = self.load_tipset(&checkpoint_from_key)? {
+                    from = checkpoint_from;
+                    break;
+                }
+            }
+            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
+        }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b49d29b and 3c5cc35.

📒 Files selected for processing (5)

src/blocks/tipset.rs (2 hunks)
src/chain/store/chain_store.rs (2 hunks)
src/chain/store/index.rs (4 hunks)
src/shim/mod.rs (1 hunks)
src/shim/policy.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/blocks/tipset.rs
src/chain/store/chain_store.rs

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

📚 Learning: 2025-08-18T03:09:47.932Z

Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Applied to files:

src/chain/store/index.rs

🧬 Code Graph Analysis (1)

src/chain/store/index.rs (2)

src/beacon/drand.rs (1)

new (251-269)

src/utils/cache/lru.rs (1)

new_with_default_metrics_registry (96-101)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: cargo-publish-dry-run
GitHub Check: Build MacOS
GitHub Check: Build Ubuntu
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: All lint checks
GitHub Check: Analyze (rust)
GitHub Check: Analyze (go)

🔇 Additional comments (4)

src/shim/policy.rs (1)

4-4: Straightforward re-export — looks good

Publicly re-exporting policy_constants via the shim is clean and keeps call sites tidy.

src/shim/mod.rs (1)

19-19: Module exposure approved

Adding pub mod policy; properly exposes the new shim module and unblocks downstream CHAIN_FINALITY usage.

src/chain/store/index.rs (2)

163-167: Finality-gated checkpoints are fork-safe and efficient

Caching only when child.epoch() % CHAIN_FINALITY == 0 and child.epoch() + CHAIN_FINALITY <= from_epoch keeps entries finalized and safe to reuse across branches, aligning with your finality-first approach (per your stated preference in recent work). Nice.

4-4: MSRV (1.89.0) supports std::sync::LazyLock
rust-toolchain.toml pins the toolchain to 1.89.0, which includes std::sync::LazyLock. No further action needed.

hanabi1224 · 2025-08-18T09:52:25Z

@elmattic

It's pretty clear what the performance improvement is, but what was the bug that was fixed?

The previous logic won't compute a state when the tipset messages are missing. Now it backfills the messages from p2p network.

Does this PR also resolve the performance issues we've been having with certain RPC endpoints, such as eth_getBlockByNumber, etc.?

Yes, this RPC method uses tipset_by_height internally.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

src/chain/store/index.rs (5)
125-133: Finality-based checkpoint cache is sound; consider moving the static to module scope

The CHAIN_FINALITY-scoped LRU keyed by ChainEpoch→TipsetKey is the right trade-off here. One minor improvement: placing this static inside a generic method yields one cache per monomorphization (e.g., different DB types), which slightly increases memory and dilutes hit rates. Moving it to module scope (single global cache) avoids duplication.

Also, for clarity/consistency with the rest of the file that uses nonzero!, consider using it for the capacity.

Add this near the top of the module (outside the impl):
use std::sync::LazyLock;
use crate::shim::clock::ChainEpoch;
use crate::utils::cache::SizeTrackingLruCache;
use nonzero_ext::nonzero;

static TIPSET_BY_HEIGHT_CHECKPOINTS: LazyLock<SizeTrackingLruCache<ChainEpoch, TipsetKey>> =
    LazyLock::new(|| {
        SizeTrackingLruCache::new_with_default_metrics_registry(
            "tipset_by_height".into(),
            nonzero!(4096usize),
        )
    });
Then inside tipset_by_height, replace uses of CACHE with TIPSET_BY_HEIGHT_CHECKPOINTS and remove the in-function static block.

134-138: Simplify checkpoint alignment and drop the num::Integer dependency

Epochs are non-negative in this context, so you can avoid mod_floor and the extra num import. A simpler align-up is easier to read.

Apply this diff:
-        // use `CHAIN_FINALITY` as checkpoint interval
-        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
-            epoch - epoch.mod_floor(&CHAIN_FINALITY) + CHAIN_FINALITY
-        }
+        // Align up to the next CHAIN_FINALITY boundary (or stay if already aligned).
+        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
+            let rem = epoch % CHAIN_FINALITY;
+            if rem == 0 { epoch } else { epoch + (CHAIN_FINALITY - rem) }
+        }
And (if unused elsewhere) remove the now-unneeded import:
- use num::Integer;
141-151: Checkpoint seeding loop: correct clamping and safe adoption

Seeding from the first cached checkpoint between to..from_epoch is correct and efficient. The ≤ from_epoch clamp ensures we never “jump ahead,” and the finality-based cache guarantees canonical-chain checkpoints, so branch ambiguity isn’t a concern.

A tiny improvement: add a short comment stating the invariant that only finalized epochs are cached (and thus safe across branches), to prevent future regressions.

163-167: Cache only finalized epochs: LGTM; minor arithmetic robustness

The finality guard is correct. To avoid potential overflow in child.epoch() + CHAIN_FINALITY (theoretically, if epochs ever approach i64::MAX), compare using subtraction instead.

Apply this diff:
-            if child.epoch() % CHAIN_FINALITY == 0 && child.epoch() + CHAIN_FINALITY <= from_epoch {
+            if child.epoch() % CHAIN_FINALITY == 0 && child.epoch() <= from_epoch - CHAIN_FINALITY {
                 CACHE.push(child.epoch(), child.key().clone());
             }
4-4: Imports: consider trimming num::Integer if you adopt the simplified alignment

If you switch next_checkpoint to the simpler arithmetic, you can drop use num::Integer;. LazyLock import looks good and keeps dependencies minimal.

Also applies to: 17-17

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3c5cc35 and cbf06fc.

📒 Files selected for processing (1)

src/chain/store/index.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

📚 Learning: 2025-08-18T03:09:47.932Z

Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Applied to files:

src/chain/store/index.rs

🧬 Code Graph Analysis (1)

src/chain/store/index.rs (2)

src/blocks/tipset.rs (11)

from (105-110)

from (160-162)

from (166-168)

from (172-177)

from (181-186)

from (205-214)

from (479-484)

new (242-260)

new (494-511)

epoch (306-308)

epoch (543-545)

src/utils/cache/lru.rs (1)

new_with_default_metrics_registry (96-101)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: tests-release
GitHub Check: tests
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: All lint checks
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: Build Ubuntu
GitHub Check: Analyze (go)
GitHub Check: Analyze (rust)

🔇 Additional comments (1)

src/chain/store/index.rs (1)

122-126: Good call making from mutable to allow checkpoint reseeding

Allowing from to be reseeded from a cached checkpoint is necessary for the fast path and keeps the external behavior intact.

elmattic · 2025-08-18T12:07:28Z

@hanabi1224 Any reasons not to add the RPC label?

hanabi1224 added 3 commits August 15, 2025 22:13

fix: add cache to improve tipset_by_height performance

f0fdbf6

fix

0500263

fix

8fb563f

refactor

518014f

hanabi1224 force-pushed the hm/tipset_by_height_cache branch from 2d94f03 to 518014f Compare August 18, 2025 02:12

hanabi1224 marked this pull request as ready for review August 18, 2025 02:12

hanabi1224 requested a review from a team as a code owner August 18, 2025 02:12

hanabi1224 requested review from elmattic and sudo-shashank and removed request for a team August 18, 2025 02:12

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

src/chain_sync/network_context.rs Show resolved Hide resolved

src/chain/store/index.rs Outdated Show resolved Hide resolved

hanabi1224 added 2 commits August 18, 2025 10:25

resolve AI comments

1e0a4da

resolve AI comment

b49d29b

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

elmattic reviewed Aug 18, 2025

View reviewed changes

hanabi1224 added 2 commits August 18, 2025 17:37

resolve comments

54c67d6

Merge branch 'main' into hm/tipset_by_height_cache

3c5cc35

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

fix

cbf06fc

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

hanabi1224 added the RPC requires calibnet RPC checks to run on CI label Aug 18, 2025

Merge branch 'main' into hm/tipset_by_height_cache

c289198

elmattic approved these changes Aug 18, 2025

View reviewed changes

sudo-shashank approved these changes Aug 18, 2025

View reviewed changes

elmattic added this pull request to the merge queue Aug 18, 2025

Merged via the queue into main with commit 82f70a2 Aug 18, 2025
43 checks passed

elmattic deleted the hm/tipset_by_height_cache branch August 18, 2025 14:27

coderabbitai bot mentioned this pull request Aug 19, 2025

feat: add --n-epochs to forest-cli state compute #5946

Merged

4 tasks

This was referenced Sep 4, 2025

eth_getTransactionReceipt: Performance issue #5599

Closed

eth_GetBlockByNumber: performance issue #5633

Closed

coderabbitai bot mentioned this pull request Sep 19, 2025

feat: populate tipset_by_height cache on startup #6100

Merged

4 tasks

coderabbitai bot mentioned this pull request Oct 9, 2025

Add new subcommands for forest-cli snasphot #6128

Merged

4 tasks

This was referenced Jan 14, 2026

fix(rpc): backfill tipset messages from p2p network in a few RPC methods #6421

Merged

chore: more derives, less manual boilerplate #6434

Merged

coderabbitai bot mentioned this pull request Jan 28, 2026

test: unit tests for state migrations #6486

Merged

4 tasks

Conversation

hanabi1224 commented Aug 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

elmattic Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

elmattic Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

elmattic Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

elmattic commented Aug 18, 2025

Uh oh!

elmattic left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

hanabi1224 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

elmattic commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hanabi1224 commented Aug 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

elmattic Aug 18, 2025 •

edited

Loading

hanabi1224 commented Aug 18, 2025 •

edited

Loading