Skip to content

chain,graph,store: added code to properly resume from a subgraph block ptr and no firehose cursor#3174

Merged
leoyvens merged 4 commits intomasterfrom
maoueh/feature/firehose-resume-from-rpc-synced-subgraph
Mar 3, 2022
Merged

chain,graph,store: added code to properly resume from a subgraph block ptr and no firehose cursor#3174
leoyvens merged 4 commits intomasterfrom
maoueh/feature/firehose-resume-from-rpc-synced-subgraph

Conversation

@maoueh
Copy link
Copy Markdown
Contributor

@maoueh maoueh commented Jan 19, 2022

Firehose is now using the subgraph ptr if no cursor is found. To ensure we do not create inconsistency in the data due to how Firehose resolves current block number, a sanity check has now been added. This check ensure that the first received block's parent from firehose is the same as the subgraph pointer block. If it's not the case, we revert to the last know final block so we are sure to start from a correct chain segment.

What final block to pick is decided by the chain itself. On Ethereum, we use block.number - reorg_threshold (guarded) while on NEAR, we use the last final block height defined by the block itself.

Missing/To Change

I pushed the PR to get an overall feeling of the idea and get early inputs.

Right now I'm using revert_block_operations but this is wrong as it's able to only revert a single block. I need in fact probably to rewind the subgraph but I would like to get advise here on how to do that properly and if there is any risks or stuff I missed with the approach.

Ping @lutter to get your input about the rewind thing above.

Draft

This is based on #3133 for now, so I would like to get the other merged first and rebased + finish this one after.

@maoueh maoueh requested a review from lutter January 19, 2022 21:18
@maoueh maoueh force-pushed the maoueh/feature/firehose-resume-from-rpc-synced-subgraph branch from fcf443c to 49f5815 Compare January 19, 2022 21:20
Base automatically changed from maoueh/fix/firehose-save-cursor-on-revert to master January 20, 2022 19:09
@maoueh maoueh force-pushed the maoueh/feature/firehose-resume-from-rpc-synced-subgraph branch 2 times, most recently from ee6aaa2 to 73ba5f6 Compare January 21, 2022 21:00
@maoueh
Copy link
Copy Markdown
Contributor Author

maoueh commented Jan 21, 2022

This has been updated with the newest discussion we had, and some other improvements:

  • revert_block_operations can now revert more than 1 block at a time (but still asserts that reverts goes backward)
  • The chain continuity is now checked inside FirehoseBlockStream directly
  • Final block from Firehose is actually is now faster and will actually be able to return a response much faster by levering the stop_block_num
  • Cursor is now automatically deleted when starting an RPC subgraph (so that the case Firehose -> RPC -> Firehose works properly)
  • NEAR has now a working block_ptr_for_number.

This was tested and worked (still need to validate the POI of a subgraph synced straight with RPC and when switching from RPC to Firehose and the like):

  • RPC -> Firehose
  • RPC -> Firehose -> RPC
  • Firehose -> RPC

Ready for review and merge.

@maoueh
Copy link
Copy Markdown
Contributor Author

maoueh commented Jan 25, 2022

I have fully tested that resuming from RPC subgraph block then falling back to Firehose on a forked block produces the same POI for a few blocks after the forked block in question.

I've fixed also an edge case where I was allowing to revert back behind manifest's defined start block.

This is ready for a final review and merge.

let mut backoff = ExponentialBackoff::new(30 * SECOND, *SUBGRAPH_ERROR_RETRY_CEIL_SECS);

loop {
match store.delete_block_cursor() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB operations are blocking and need to be asyncified so they can be used in an async context, you need to use with_conn within delete_block_cursor to asyncify it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy of other existing methods in the trait like block_ptr, block_cursor and many others. Is the new paradigm to always use async from now on?

I would like to perform that in another PR mainly because this is blocking the integration test workflow and it would be nice to unlock it.


async fn delete_subgraph_firehose_cursor(logger: &Logger, store: &dyn WritableStore) {
debug!(logger, "Deleting any existing Firehose cursor");
let mut backoff = ExponentialBackoff::new(30 * SECOND, *SUBGRAPH_ERROR_RETRY_CEIL_SECS);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This retry looks a lot like

async fn retry_async<T, F, Fut>(&self, op: &str, f: F) -> Result<T, StoreError>
, you could probably extract a reusable fn.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, in the FirehoseBlockIngestor there is a bunch of those also so it would be a good thing to have a re-usable method. Again, would like to postpone that to another PR.

.map_err(|e| e.into())
}

pub fn get_subgraph_firehose_cursor(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this also needs to be asyncified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not part of this PR, but I would include that in an upcoming PR that would asyncinfy the various methods found in the WritableStore trait.

@maoueh maoueh force-pushed the maoueh/feature/firehose-resume-from-rpc-synced-subgraph branch from b4540a4 to 6d1b8ba Compare February 24, 2022 15:44
…k ptr and no firehose cursor

Firehose is now using the subgraph ptr if no cursor is found. To ensure we do not create inconsistency in the data due to how Firehose resolves current block number, a sanity check has now been added. This check ensure that the first received block's parent from firehose is the same as the subgraph pointer block. If it's not the case, we revert to the last know final block so we are sure to start from a correct chain segment.

What final block to pick is decided by the chain itself. On Ethereum, we use `block.number - reorg_threshold` (guarded) while on NEAR, we use the last final block height defined by the block itself.
@sduchesneau sduchesneau force-pushed the maoueh/feature/firehose-resume-from-rpc-synced-subgraph branch from 6d1b8ba to 5d8c4f1 Compare February 28, 2022 20:23
@sduchesneau
Copy link
Copy Markdown
Contributor

@otaviopace I just rebased, then double-checked this PR and the behavior is as-expected. Would you kindly go ahead and merge this into master ?

@leoyvens leoyvens merged commit ae1d808 into master Mar 3, 2022
@leoyvens leoyvens deleted the maoueh/feature/firehose-resume-from-rpc-synced-subgraph branch March 3, 2022 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants