Skip to content

fix: add cache to speed up fn tipset_by_height#5944

Merged
elmattic merged 10 commits intomainfrom
hm/tipset_by_height_cache
Aug 18, 2025
Merged

fix: add cache to speed up fn tipset_by_height#5944
elmattic merged 10 commits intomainfrom
hm/tipset_by_height_cache

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Aug 18, 2025

Summary of changes

This is to facilitate @ADobrodey to generate historical lite snapshots with bug fixes and performance improvements

Changes introduced in this pull request:

  • add cache to speed up fn tipset_by_height
  • (Forest.StateCompute) load tipset messages from network if absent in datastore

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • New Features

    • One-step persistence for complete tipsets; public APIs to load/fetch full tipsets and exchange chain messages.
    • RPC state computations now auto-load or fetch missing full tipsets when needed.
  • Refactor

    • Checkpoint-based caching for tipset-by-height to speed lookups.
    • Reduced data copying and more borrowing in sync paths for efficiency.
    • Message-root lookup now uses richer error context.
  • Chores

    • Renamed a metrics label for clearer observability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Walkthrough

Adds FullTipset.persist to validate and store tipsets, changes read_msg_cids to accept a block header, introduces a CHAIN_FINALITY-based LRU checkpoint cache for tipset_by_height, refactors chain follower to borrow TipsetKey and expose loaders, extends network context for full-tipset/messages exchange, and generalizes chain-exchange conversions.

Changes

Cohort / File(s) Summary of modifications
Tipset persistence & validation
src/blocks/tipset.rs, src/chain_sync/validation.rs
Added pub fn persist(&self, db: &impl Blockstore) -> anyhow::Result<()> on FullTipset that validates per-block message-root metadata via TipsetValidator::validate_msg_root and persists blocks. validate_msg_root changed to an associated function (removed &self) and call sites updated.
Message CID reading API
src/chain/store/chain_store.rs, src/libp2p/chain_exchange/provider.rs, src/rpc/methods/chain.rs
read_msg_cids signature changed to accept &CachingBlockHeader instead of a messages CID/ref; callers updated to pass the block header. read_msg_cids derives msg_cid internally and improves error text with epoch and block CID.
Chain index checkpoint cache
src/chain/store/index.rs
Added CHAIN_FINALITY-based checkpoint caching: static CACHE: LazyLock<SizeTrackingLruCache<ChainEpoch, TipsetKey>>, next_checkpoint helper using mod_floor, seeding from from cached tipsets via load_tipset, and caching finalized child tipsets during backward traversal. Changed from to mut.
Chain follower / API & ownership refactor
src/chain_sync/chain_follower.rs, src/chain_sync/mod.rs
Call sites and functions now take &TipsetKey (borrowed). get_full_tipset and load_full_tipset made public and adjusted to load/persist FullTipset via FullTipset::persist. Re-exported load_full_tipset from chain_sync.
Network context: full-tipset/messages fetch
src/chain_sync/network_context.rs, src/rpc/methods/f3.rs, src/rpc/methods/state.rs
Added chain_exchange_messages(&self, peer_id, ts) to request local-tipset messages and renamed/exposed chain_exchange_ftschain_exchange_full_tipset(&self, peer_id, tsk). RPCs updated to use new calls; state RPC ensures full tipset is loaded (store or network) and persists it. Minor metrics label rename in F3.
Chain-exchange conversion generalization
src/libp2p/chain_exchange/message.rs
ChainExchangeResponse::into_result generalized to accept any TryFrom<TipsetBundle> where the associated Error: Display; conversion errors mapped to strings with to_string().
Provider & RPC callers updated
src/libp2p/chain_exchange/provider.rs, src/rpc/methods/chain.rs
Call sites updated to pass block_header to read_msg_cids instead of &block_header.messages; surrounding logic unchanged.
Shim policy module
src/shim/mod.rs, src/shim/policy.rs
Added pub mod policy; and new src/shim/policy.rs that re-exports policy_constants from fil_actors_shared::v16::runtime.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant ChainFollower
  participant ChainStore
  participant Network
  participant Blockstore

  Caller->>ChainFollower: get_full_tipset(&TipsetKey)
  ChainFollower->>ChainStore: load_full_tipset(&TipsetKey)
  alt present in store
    ChainStore-->>ChainFollower: FullTipset
  else missing
    ChainFollower->>Network: chain_exchange_full_tipset(peer, &TipsetKey)
    Network-->>ChainFollower: FullTipset
    ChainFollower->>Blockstore: FullTipset.persist(db)
  end
  ChainFollower-->>Caller: FullTipset
Loading
sequenceDiagram
  participant RPC
  participant ChainStore
  participant Network
  participant Blockstore
  participant VM

  RPC->>ChainStore: try load_full_tipset(ts.key())
  alt present
    ChainStore-->>RPC: FullTipset
  else missing
    RPC->>Network: chain_exchange_messages(None, &ts)
    Network-->>RPC: FullTipset
    RPC->>Blockstore: FullTipset.persist(db)
  end
  RPC->>VM: compute_tipset_state(ts)
  VM-->>RPC: Result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

RPC

Suggested reviewers

  • akaladarshi
  • LesnyRumcajs
  • elmattic

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/tipset_by_height_cache

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hanabi1224 hanabi1224 force-pushed the hm/tipset_by_height_cache branch from 2d94f03 to 518014f Compare August 18, 2025 02:12
@hanabi1224 hanabi1224 marked this pull request as ready for review August 18, 2025 02:12
@hanabi1224 hanabi1224 requested a review from a team as a code owner August 18, 2025 02:12
@hanabi1224 hanabi1224 requested review from elmattic and sudo-shashank and removed request for a team August 18, 2025 02:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
src/chain/store/index.rs (2)

153-156: Consider caching both child and parent on boundaries

You only cache child at boundaries. Caching parent when it lands on a boundary can improve hit rates for lookups that resolve to older epochs in the same call.

Optional tweak:

 for (child, parent) in self.chain(from).tuple_windows() {
     if child.epoch() % CHECKPOINT_INTERVAL == 0 {
         CACHE.push(child.epoch(), child.clone());
     }
+    if parent.epoch() % CHECKPOINT_INTERVAL == 0 {
+        CACHE.push(parent.epoch(), parent.clone());
+    }
     ...
 }

208-308: Add tests that exercise checkpoint cache across forks

Current tests don’t hit the checkpoint path (epochs are < 1000) and don’t cover fork-selection correctness with cached checkpoints. Please add:

  • A scenario that populates the cache at a checkpoint epoch on branch A, then calls tipset_by_height on branch B that passes through the same checkpoint epoch, asserting the result remains on branch B.
  • A scenario verifying we never select a checkpoint > from.epoch().

I can draft these tests if helpful.

src/chain_sync/validation.rs (1)

72-75: Nit: fix typo in the comment (“mst root” → “msg root”)

Minor typo in the validation comment.

-        // 1. Calculating the message root using all of the messages to ensure it
-        // matches the mst root in the block header 2. Ensuring it has not
+        // 1. Calculating the message root using all of the messages to ensure it
+        // matches the msg root in the block header 2. Ensuring it has not
src/blocks/tipset.rs (1)

550-558: Persist full tipset with pre-persist msg-root validation — clarify side effects

The implementation validates msg roots before persisting each block, which is a good safety check. One concern: TipsetValidator::validate_msg_root likely computes and persists TxMeta/AMTs as a side effect. If subsequent block.persist fails, the store may end up with TxMeta referencing messages not yet present. This is acceptable if failures are practically impossible or writes are idempotent, but the side effect is non-obvious from the name.

  • Consider documenting validate_msg_root’s persistence side effects here, or switching to a side-effect-free check (if available), followed by an explicit persist of TxMeta.
  • Alternatively, rename validate_msg_root to something that reflects its effect (e.g., ensure_msg_root_persisted) to make intent clear at call sites.

Would you like me to draft doc comments or a small refactor to make this explicit?

src/rpc/methods/f3.rs (1)

584-601: Persist after full validation and use FullTipset::persist to centralize logic

Currently, blocks are persisted before full tipset validation, and the per-block loop duplicates logic now encapsulated by FullTipset::persist (which also validates msg roots). To reduce the risk of persisting invalid data and to keep logic centralized, validate first, then persist the whole tipset via FullTipset::persist.

Proposed change:

  • Run validator.validate first (already present).
  • Replace the per-block loop with fts.persist(ctx.store())? after successful validation.

Apply this diff:

-                let fts = ctx
+                let fts = ctx
                     .sync_network_context
                     .chain_exchange_full_tipset(None, &tsk)
                     .await?;
-                for block in fts.blocks() {
-                    block.persist(ctx.store())?;
-                }
                 let validator = TipsetValidator(&fts);
                 validator.validate(
                     ctx.chain_store(),
                     None,
                     &ctx.chain_store().genesis_tipset(),
                     ctx.chain_config().block_delay_secs,
                 )?;
+                // Persist the tipset after successful validation.
+                fts.persist(ctx.store())?;
                 let ts = Arc::new(Tipset::from(fts));
                 ctx.chain_store().put_tipset(&ts)?;
                 ctx.chain_store().set_heaviest_tipset(finalized_ts)?;
src/rpc/methods/state.rs (1)

1435-1449: Solid fallback: pre-loading FullTipset with network fetch when absent in store

Good addition. Ensures state computation has messages available by first attempting a local load and falling back to ChainExchange if needed. Persisting the fetched FullTipset closes the loop.

Two small improvements would help operability and diagnostics (optional):

  • Add a debug trace when falling back to the network to aid troubleshooting.
  • Add context to persist errors for quicker issue triage.
         // Attempt to load full tipset from the store
-        if crate::chain_sync::load_full_tipset(ctx.chain_store(), ts.key()).is_err() {
+        if crate::chain_sync::load_full_tipset(ctx.chain_store(), ts.key()).is_err() {
+            tracing::debug!(epoch = %ts.epoch(), tsk = %ts.key(), "FullTipset not in store; fetching via ChainExchangeMessages");
             // Load full tipset from the network
             let fts = ctx
                 .sync_network_context
                 .chain_exchange_messages(None, &ts)
                 .await
                 .map_err(|e| anyhow::anyhow!(e))?;
-            fts.persist(ctx.store())?;
+            fts.persist(ctx.store()).context("persisting fetched FullTipset into local store")?;
         }
src/chain_sync/chain_follower.rs (2)

434-452: Consistent, robust FullTipset retrieval and persistence

  • Early return on successful local load is optimal.
  • Network fallback to chain_exchange_full_tipset and persisting the entire tipset is the right behavior.

Minor nit: prefer adding error context for the network call and persist path to improve debuggability.

-    let tipset = network
-        .chain_exchange_full_tipset(peer_id, tipset_keys)
-        .await
-        .map_err(|e| anyhow::anyhow!(e))?;
-    tipset.persist(&chain_store.db)?;
+    let tipset = network
+        .chain_exchange_full_tipset(peer_id, tipset_keys)
+        .await
+        .map_err(|e| anyhow::anyhow!(e))?;
+    tipset
+        .persist(&chain_store.db)
+        .context("persisting FullTipset fetched via ChainExchange")?;

458-477: Use FullTipset::persist instead of per-block persist in batch path

The single-tipset path persists the entire tipset atomically. Here, the batch path persists blocks individually, which is slower and risks subtle inconsistencies if future persist behavior diverges between block and tipset levels. For consistency and maintainability, persist per tipset.

     let tipsets = network
         .chain_exchange_full_tipsets(peer_id, tipset_keys)
         .await
         .map_err(|e| anyhow::anyhow!(e))?;
 
-    for tipset in tipsets.iter() {
-        for block in tipset.blocks() {
-            block.persist(&chain_store.db)?;
-        }
-    }
+    for tipset in &tipsets {
+        tipset.persist(&chain_store.db)?;
+    }
 
     Ok(tipsets)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 17676f6 and 518014f.

📒 Files selected for processing (12)
  • src/blocks/tipset.rs (2 hunks)
  • src/chain/store/chain_store.rs (2 hunks)
  • src/chain/store/index.rs (4 hunks)
  • src/chain_sync/chain_follower.rs (7 hunks)
  • src/chain_sync/mod.rs (1 hunks)
  • src/chain_sync/network_context.rs (3 hunks)
  • src/chain_sync/validation.rs (1 hunks)
  • src/libp2p/chain_exchange/message.rs (1 hunks)
  • src/libp2p/chain_exchange/provider.rs (1 hunks)
  • src/rpc/methods/chain.rs (1 hunks)
  • src/rpc/methods/f3.rs (2 hunks)
  • src/rpc/methods/state.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code Graph Analysis (11)
src/chain/store/chain_store.rs (4)
src/message/chain_message.rs (1)
  • cid (29-34)
src/blocks/header.rs (2)
  • cid (69-71)
  • cid (304-306)
src/shim/message.rs (1)
  • cid (195-198)
src/blocks/block.rs (1)
  • cid (41-43)
src/rpc/methods/state.rs (1)
src/chain_sync/chain_follower.rs (1)
  • load_full_tipset (479-501)
src/blocks/tipset.rs (3)
src/blocks/block.rs (1)
  • persist (46-50)
src/chain/store/chain_store.rs (1)
  • db (502-502)
src/chain_sync/validation.rs (1)
  • validate_msg_root (106-116)
src/libp2p/chain_exchange/message.rs (2)
src/shim/sector.rs (1)
  • try_from (243-245)
src/shim/crypto.rs (2)
  • try_from (205-217)
  • try_from (325-332)
src/chain_sync/mod.rs (1)
src/chain_sync/chain_follower.rs (2)
  • chain_follower (134-331)
  • load_full_tipset (479-501)
src/rpc/methods/f3.rs (1)
src/blocks/tipset.rs (1)
  • tsk (265-269)
src/libp2p/chain_exchange/provider.rs (1)
src/chain/store/chain_store.rs (2)
  • read_msg_cids (494-513)
  • db (502-502)
src/chain/store/index.rs (3)
src/beacon/drand.rs (1)
  • new (251-269)
src/blocks/tipset.rs (11)
  • new (242-260)
  • new (494-511)
  • epoch (306-308)
  • epoch (543-545)
  • from (105-110)
  • from (160-162)
  • from (166-168)
  • from (172-177)
  • from (181-186)
  • from (205-214)
  • from (479-484)
src/utils/cache/lru.rs (1)
  • new_with_default_metrics_registry (96-101)
src/rpc/methods/chain.rs (1)
src/chain/store/chain_store.rs (1)
  • read_msg_cids (494-513)
src/chain_sync/network_context.rs (2)
src/blocks/tipset.rs (2)
  • new (242-260)
  • new (494-511)
src/blocks/header.rs (1)
  • new (282-288)
src/chain_sync/chain_follower.rs (4)
src/blocks/tipset.rs (2)
  • key (336-339)
  • key (530-533)
src/rpc/mod.rs (1)
  • chain_store (466-468)
src/state_manager/mod.rs (1)
  • chain_store (296-298)
src/blocks/chain4u.rs (1)
  • tipset (183-185)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: All lint checks
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (15)
src/libp2p/chain_exchange/message.rs (1)

128-130: Generic TryFrom bound + error mapping look good

Relaxing the bound to T: TryFrom<TipsetBundle, Error: Display> and mapping errors via to_string() is correct and backward-compatible with existing TryFrom<TipsetBundle, Error = String> impls. Control flow remains unchanged.

Also applies to: 137-141

src/chain/store/chain_store.rs (2)

470-476: Call-site update to pass block header is correct

Delegating to read_msg_cids(db, bh) aligns with the new signature and removes the need to plumb raw CID parameters around.


494-513: Improved diagnostics for missing TxMeta

Passing the full block header allows a much more actionable error: including the epoch and block CID is great for triage. Logic for loading AMTs remains unchanged.

src/chain_sync/validation.rs (1)

76-76: Static call to validate_msg_root is correct

Switching from self.validate_msg_root to Self::validate_msg_root matches the new static method signature and removes a needless receiver.

src/chain_sync/network_context.rs (1)

244-246: Broadened generic bounds align with chain_exchange/message changes

Updating handle_chain_exchange_request to accept T: TryFrom<TipsetBundle, Error: Display> syncs with ChainExchangeResponse::into_result::<T>(). Good cohesion across layers.

src/rpc/methods/chain.rs (1)

556-561: Correct API update: pass the whole block header to read_msg_cids

Switching to read_msg_cids(ctx.store(), &blk) aligns with the new signature and improves error context downstream. Looks good.

src/chain_sync/mod.rs (1)

16-21: Publicly re-exporting load_full_tipset is consistent with new call sites

Re-export looks correct and matches updated RPC/state call sites relying on borrowed TipsetKey. No issues.

src/libp2p/chain_exchange/provider.rs (1)

98-101: read_msg_cids call updated correctly

Passing block_header matches the revised API and enables improved error reporting (epoch + block cid). Change is correct.

src/rpc/methods/f3.rs (1)

166-171: Metric name change may break dashboards/alerts

Renaming the cache metrics key to "get_powertable" can affect existing dashboards/alerts expecting "get_powertable_cache". If this rename is intentional, ensure dashboards and alerts are updated accordingly.

Do you want a follow-up PR checklist for updating observability assets?

src/rpc/methods/state.rs (1)

1453-1453: LGTM: using the pre-loaded tipset for state computation

Reusing the locally bound ts after ensuring messages are available is correct and keeps the control flow cohesive.

src/chain_sync/chain_follower.rs (5)

179-180: LGTM: pass TipsetKey by reference

Borrowing &tipset_keys avoids an extra allocation/copy and aligns with the new API.


191-192: LGTM: consistent borrowing of TipsetKey

Using &key here keeps the call-site consistent with the updated signature and avoids unnecessary moves.


479-485: LGTM: public, borrowed-key load_full_tipset with typed message load

Exposing a borrowed &TipsetKey API and reconstructing a FullTipset via block_messages is clean and efficient.

Also applies to: 501-501


601-601: LGTM: parent readiness check with borrowed-key loader

Using load_full_tipset(&self.cs, tipset.parents()) for readiness keeps the new borrowing conventions consistent and avoids needless allocations.


875-876: LGTM: updated call site to pass borrowed key

The fetch task now correctly passes &key to get_full_tipset_batch.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/chain/store/index.rs (1)

125-147: Cache key is ambiguous across forks; guard adoption with a same-branch check

The global LRU is keyed only by ChainEpoch. Unconditionally replacing from with a cached tipset at that epoch can select a tipset from a different fork than the caller’s from. This breaks the contract that from disambiguates branches.

Action:

  • Only adopt a cached checkpoint if it’s an ancestor of the provided from (i.e., on the same branch). This preserves correctness without giving up the performance benefit when the checkpoint matches the branch.

Apply this diff within the shown hunk:

-        let mut checkpoint_from_epoch = to;
-        while checkpoint_from_epoch < from_epoch {
-            if let Some(checkpoint_from) = CACHE.get_cloned(&checkpoint_from_epoch) {
-                from = checkpoint_from;
-                break;
-            }
-            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
-        }
+        // Try to seed `from` from a checkpoint on the same branch.
+        let mut checkpoint_from_epoch = to;
+        while checkpoint_from_epoch < from_epoch {
+            if let Some(cp) = CACHE.get_cloned(&checkpoint_from_epoch) {
+                // Guard adoption: only if cached cp is an ancestor of current `from`.
+                let cp_epoch = cp.epoch();
+                let same_branch = self
+                    .chain(from.clone())
+                    .take_while(|ts| ts.epoch() >= cp_epoch)
+                    .any(|ts| ts.key() == cp.key());
+                if same_branch {
+                    from = cp;
+                    break;
+                }
+                // else: ignore and keep searching older checkpoints
+            }
+            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
+        }

If you prefer a helper, I can provide a small is_ancestor_on_same_branch(&self, child, ancestor) utility to encapsulate the check. Let me know.

🧹 Nitpick comments (3)
src/chain/store/index.rs (3)

125-133: Scope of static cache may cross DB/network boundaries

static CACHE: LazyLock<...> is shared across all ChainIndex instances and databases in-process. This can lead to:

  • Cross-network/DB contamination (e.g., testnet/mainnet, separate MemoryDBs in tests).
  • Attempting to walk parents of a cached tipset that do not exist in the current self.db.

Consider scoping this cache to ChainIndex (as a struct field) or including a partitioning dimension (e.g., network/chain ID) in the key to avoid cross-context pollution.


134-136: Minor: clarify checkpoint computation and boundary behavior

next_checkpoint always returns the next multiple strictly greater than epoch (even when already aligned). This is fine given the current loop starts at to and probes to first, but a quick comment here would prevent future misinterpretation.

Apply:

-        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
-            epoch - epoch.mod_floor(&CHECKPOINT_INTERVAL) + CHECKPOINT_INTERVAL
-        }
+        // Returns the next multiple of CHECKPOINT_INTERVAL strictly greater than `epoch`.
+        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
+            epoch - epoch.mod_floor(&CHECKPOINT_INTERVAL) + CHECKPOINT_INTERVAL
+        }

160-166: Use saturating subtraction to avoid potential overflow and reflect intent

The condition child.epoch() + CHECKPOINT_INTERVAL <= from_epoch can be expressed without addition, avoiding a potential overflow edge case and matching the “at least one interval behind” intent.

Apply:

-            if child.epoch() % CHECKPOINT_INTERVAL == 0
-                && child.epoch() + CHECKPOINT_INTERVAL <= from_epoch
+            if child.epoch() % CHECKPOINT_INTERVAL == 0
+                && child.epoch() <= from_epoch.saturating_sub(CHECKPOINT_INTERVAL)
             {
                 CACHE.push(child.epoch(), child.clone());
             }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1e0a4da and b49d29b.

📒 Files selected for processing (1)
  • src/chain/store/index.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.907Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-18T03:09:47.907Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.907Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Applied to files:

  • src/chain/store/index.rs
🧬 Code Graph Analysis (1)
src/chain/store/index.rs (2)
src/beacon/drand.rs (1)
  • new (251-269)
src/utils/cache/lru.rs (1)
  • new_with_default_metrics_registry (96-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: All lint checks
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
src/chain/store/index.rs (2)

122-124: Allowing mut from is appropriate for cache seeding

Changing the parameter to mut is necessary to reseed the traversal starting point from a cached checkpoint. No concerns here.


4-4: std::sync::LazyLock is supported under MSRV 1.89.0
The project’s rust-toolchain.toml pins the channel to 1.89.0, which includes std::sync::LazyLock. No change is needed here.

mut from: Arc<Tipset>,
resolve: ResolveNullTipset,
) -> Result<Arc<Tipset>, Error> {
const CHECKPOINT_INTERVAL: ChainEpoch = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the chain finality constant here, or a value expressed in terms of chain finality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

resolve: ResolveNullTipset,
) -> Result<Arc<Tipset>, Error> {
const CHECKPOINT_INTERVAL: ChainEpoch = 1000;
static CACHE: LazyLock<SizeTrackingLruCache<ChainEpoch, Arc<Tipset>>> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why choose Tipset for the value instead of just TipsetKey? I expect a Tipset to have a much larger memory footprint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Change to using TipsetKey

Copy link
Contributor

@elmattic elmattic Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a chekpoint tipset value be cached in the ChainStore's TipsetCache? If not, is this possible or do we even want this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to using self.load_tipset instead which is tipset_cache aware

@elmattic
Copy link
Contributor

with bug fixes and performance improvements

It's pretty clear what the performance improvement is, but what was the bug that was fixed?

Copy link
Contributor

@elmattic elmattic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR also resolve the performance issues we've been having with certain RPC endpoints, such as eth_getBlockByNumber, etc.?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/chain/store/index.rs (3)

127-133: LRU capacity: minor ergonomic tweak

Use the existing nonzero! macro (already imported) for clarity instead of try_into().expect("infallible").

 static CACHE: LazyLock<SizeTrackingLruCache<ChainEpoch, TipsetKey>> = LazyLock::new(|| {
     SizeTrackingLruCache::new_with_default_metrics_registry(
         "tipset_by_height".into(),
-        4096.try_into().expect("infallible"),
+        nonzero!(4096_usize),
     )
 });

17-17: Drop num::Integer by using rem_euclid

You can avoid the extra trait import and dependency by using the standard rem_euclid for non-negative modulo. Behavior matches mod_floor here.

- use num::Integer;
+ // use of rem_euclid below removes the need for num::Integer
-        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
-            epoch - epoch.mod_floor(&CHAIN_FINALITY) + CHAIN_FINALITY
-        }
+        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
+            epoch - epoch.rem_euclid(CHAIN_FINALITY) + CHAIN_FINALITY
+        }

Also applies to: 134-137


141-150: Leverage the tipset cache when seeding from the checkpoint cache

Use self.load_tipset to populate the primary tipset cache and track metrics instead of calling Tipset::load directly.

-        while checkpoint_from_epoch < from_epoch {
-            if let Some(checkpoint_from_key) = CACHE.get_cloned(&checkpoint_from_epoch)
-                && let Ok(Some(checkpoint_from)) = Tipset::load(&self.db, &checkpoint_from_key)
-            {
-                from = checkpoint_from.into();
-                break;
-            }
-            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
-        }
+        while checkpoint_from_epoch < from_epoch {
+            if let Some(checkpoint_from_key) = CACHE.get_cloned(&checkpoint_from_epoch) {
+                if let Some(checkpoint_from) = self.load_tipset(&checkpoint_from_key)? {
+                    from = checkpoint_from;
+                    break;
+                }
+            }
+            checkpoint_from_epoch = next_checkpoint(checkpoint_from_epoch);
+        }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b49d29b and 3c5cc35.

📒 Files selected for processing (5)
  • src/blocks/tipset.rs (2 hunks)
  • src/chain/store/chain_store.rs (2 hunks)
  • src/chain/store/index.rs (4 hunks)
  • src/shim/mod.rs (1 hunks)
  • src/shim/policy.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/blocks/tipset.rs
  • src/chain/store/chain_store.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-18T03:09:47.932Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Applied to files:

  • src/chain/store/index.rs
🧬 Code Graph Analysis (1)
src/chain/store/index.rs (2)
src/beacon/drand.rs (1)
  • new (251-269)
src/utils/cache/lru.rs (1)
  • new_with_default_metrics_registry (96-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
src/shim/policy.rs (1)

4-4: Straightforward re-export — looks good

Publicly re-exporting policy_constants via the shim is clean and keeps call sites tidy.

src/shim/mod.rs (1)

19-19: Module exposure approved

Adding pub mod policy; properly exposes the new shim module and unblocks downstream CHAIN_FINALITY usage.

src/chain/store/index.rs (2)

163-167: Finality-gated checkpoints are fork-safe and efficient

Caching only when child.epoch() % CHAIN_FINALITY == 0 and child.epoch() + CHAIN_FINALITY <= from_epoch keeps entries finalized and safe to reuse across branches, aligning with your finality-first approach (per your stated preference in recent work). Nice.


4-4: MSRV (1.89.0) supports std::sync::LazyLock
rust-toolchain.toml pins the toolchain to 1.89.0, which includes std::sync::LazyLock. No further action needed.

@hanabi1224
Copy link
Contributor Author

hanabi1224 commented Aug 18, 2025

@elmattic

It's pretty clear what the performance improvement is, but what was the bug that was fixed?

The previous logic won't compute a state when the tipset messages are missing. Now it backfills the messages from p2p network.

Does this PR also resolve the performance issues we've been having with certain RPC endpoints, such as eth_getBlockByNumber, etc.?

Yes, this RPC method uses tipset_by_height internally.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
src/chain/store/index.rs (5)

125-133: Finality-based checkpoint cache is sound; consider moving the static to module scope

The CHAIN_FINALITY-scoped LRU keyed by ChainEpoch→TipsetKey is the right trade-off here. One minor improvement: placing this static inside a generic method yields one cache per monomorphization (e.g., different DB types), which slightly increases memory and dilutes hit rates. Moving it to module scope (single global cache) avoids duplication.

Also, for clarity/consistency with the rest of the file that uses nonzero!, consider using it for the capacity.

Add this near the top of the module (outside the impl):

use std::sync::LazyLock;
use crate::shim::clock::ChainEpoch;
use crate::utils::cache::SizeTrackingLruCache;
use nonzero_ext::nonzero;

static TIPSET_BY_HEIGHT_CHECKPOINTS: LazyLock<SizeTrackingLruCache<ChainEpoch, TipsetKey>> =
    LazyLock::new(|| {
        SizeTrackingLruCache::new_with_default_metrics_registry(
            "tipset_by_height".into(),
            nonzero!(4096usize),
        )
    });

Then inside tipset_by_height, replace uses of CACHE with TIPSET_BY_HEIGHT_CHECKPOINTS and remove the in-function static block.


134-138: Simplify checkpoint alignment and drop the num::Integer dependency

Epochs are non-negative in this context, so you can avoid mod_floor and the extra num import. A simpler align-up is easier to read.

Apply this diff:

-        // use `CHAIN_FINALITY` as checkpoint interval
-        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
-            epoch - epoch.mod_floor(&CHAIN_FINALITY) + CHAIN_FINALITY
-        }
+        // Align up to the next CHAIN_FINALITY boundary (or stay if already aligned).
+        fn next_checkpoint(epoch: ChainEpoch) -> ChainEpoch {
+            let rem = epoch % CHAIN_FINALITY;
+            if rem == 0 { epoch } else { epoch + (CHAIN_FINALITY - rem) }
+        }

And (if unused elsewhere) remove the now-unneeded import:

- use num::Integer;

141-151: Checkpoint seeding loop: correct clamping and safe adoption

Seeding from the first cached checkpoint between to..from_epoch is correct and efficient. The ≤ from_epoch clamp ensures we never “jump ahead,” and the finality-based cache guarantees canonical-chain checkpoints, so branch ambiguity isn’t a concern.

A tiny improvement: add a short comment stating the invariant that only finalized epochs are cached (and thus safe across branches), to prevent future regressions.


163-167: Cache only finalized epochs: LGTM; minor arithmetic robustness

The finality guard is correct. To avoid potential overflow in child.epoch() + CHAIN_FINALITY (theoretically, if epochs ever approach i64::MAX), compare using subtraction instead.

Apply this diff:

-            if child.epoch() % CHAIN_FINALITY == 0 && child.epoch() + CHAIN_FINALITY <= from_epoch {
+            if child.epoch() % CHAIN_FINALITY == 0 && child.epoch() <= from_epoch - CHAIN_FINALITY {
                 CACHE.push(child.epoch(), child.key().clone());
             }

4-4: Imports: consider trimming num::Integer if you adopt the simplified alignment

If you switch next_checkpoint to the simpler arithmetic, you can drop use num::Integer;. LazyLock import looks good and keeps dependencies minimal.

Also applies to: 17-17

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3c5cc35 and cbf06fc.

📒 Files selected for processing (1)
  • src/chain/store/index.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-18T03:09:47.932Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Applied to files:

  • src/chain/store/index.rs
🧬 Code Graph Analysis (1)
src/chain/store/index.rs (2)
src/blocks/tipset.rs (11)
  • from (105-110)
  • from (160-162)
  • from (166-168)
  • from (172-177)
  • from (181-186)
  • from (205-214)
  • from (479-484)
  • new (242-260)
  • new (494-511)
  • epoch (306-308)
  • epoch (543-545)
src/utils/cache/lru.rs (1)
  • new_with_default_metrics_registry (96-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (1)
src/chain/store/index.rs (1)

122-126: Good call making from mutable to allow checkpoint reseeding

Allowing from to be reseeded from a cached checkpoint is necessary for the fast path and keeps the external behavior intact.

@elmattic
Copy link
Contributor

@hanabi1224 Any reasons not to add the RPC label?

@hanabi1224 hanabi1224 added the RPC requires calibnet RPC checks to run on CI label Aug 18, 2025
@elmattic elmattic added this pull request to the merge queue Aug 18, 2025
Merged via the queue into main with commit 82f70a2 Aug 18, 2025
43 checks passed
@elmattic elmattic deleted the hm/tipset_by_height_cache branch August 18, 2025 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RPC requires calibnet RPC checks to run on CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants