Skip to content

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Jan 13, 2026

Overview

Adding additional tracing within NewServiceHandler

Add OpenTelemetry tracing for all RPC server methods:

- ExecutorService: InitChain, GetTxs, ExecuteTxs, SetFinal
- StoreService: GetBlock, GetState, GetMetadata, GetGenesisDaHeight, GetP2PStoreInfo
- P2PService: GetPeerInfo, GetNetInfo
- ConfigService: GetNamespace, GetSignerInfo

All implementations follow decorator pattern with comprehensive test coverage.
Tracer names: ev-node/{service}-service
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chatton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of the system by integrating OpenTelemetry for distributed tracing. It introduces tracing capabilities for the Store, P2P, and Config RPC services, allowing for detailed insights into their execution flow and performance. The tracing can be toggled via configuration, providing flexibility for different environments.

Highlights

  • OpenTelemetry Integration: Introduced OpenTelemetry for distributed tracing across key RPC services.
  • Service Tracing: Added tracing to Store, P2P, and Config RPC service handlers.
  • Conditional Tracing: Tracing is conditionally enabled based on the config.Instrumentation.IsTracingEnabled() flag.
  • Detailed Span Attributes: Spans for RPC calls capture relevant request and response attributes (e.g., block height, hash, peer count, namespace) and record errors.
  • Dependency Updates: Updated go.mod and go.sum to include OpenTelemetry and testify dependencies.
  • Comprehensive Testing: New unit tests ensure the correct functionality and attribute capturing of the tracing wrappers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Jan 13, 2026

Claude finished @chatton's task in 2m 13s —— View job


Comprehensive Code Review

I've completed a thorough review of the tracing implementation for Store, P2P, and Config services. The code is generally well-structured, but I've identified several issues across different categories.


Critical Issues

1. Missing Per-Store Attributes in GetP2PStoreInfo (Gemini Finding)

Location: pkg/rpc/server/tracing.go:147-165
Issue: The traced method only sets store_count but doesn't capture individual store heights/labels. This significantly reduces observability value.

Why this matters: When debugging P2P sync issues, operators need to see if header/data stores are in sync. Without per-store attributes, the trace doesn't provide actionable diagnostic information.

Recommendation:

span.SetAttributes(
    attribute.Int("store_count", len(res.Msg.Stores)),
)
for i, store := range res.Msg.Stores {
    span.SetAttributes(
        attribute.String(fmt.Sprintf("store_%d_label", i), store.Label),
        attribute.Int64(fmt.Sprintf("store_%d_height", i), int64(store.Height)),
    )
}

Note: The previous buggy code using string(rune(i)) was correctly removed, but the entire feature was lost instead of being fixed.

Fix this →


2. Missing Test Coverage for Per-Store Attributes (Gemini Finding)

Location: pkg/rpc/server/tracing_test.go:315
Issue: TestTracedStoreService_GetP2PStoreInfo_Success only checks store_count, missing assertions for individual store attributes.

Recommendation: Add test coverage once the above fix is implemented:

attrs := span.Attributes()
requireAttribute(t, attrs, "store_count", 2)
requireAttribute(t, attrs, "store_0_label", "Header Store")
requireAttribute(t, attrs, "store_0_height", int64(100))
requireAttribute(t, attrs, "store_1_label", "Data Store")
requireAttribute(t, attrs, "store_1_height", int64(99))

High Priority Issues

3. Inefficient GetBlock Attribute Setting Pattern (Gemini Finding)

Location: pkg/rpc/server/tracing.go:32-77
Issue: The code switches on req.Msg.Identifier twice - once to extract height (lines 37-42), then again to add hash attribute (lines 52-54). This is both inefficient and harder to maintain.

Recommendation: Use a single switch statement that builds all attributes before starting the span:

func (t *tracedStoreServer) GetBlock(
    ctx context.Context,
    req *connect.Request[pb.GetBlockRequest],
) (*connect.Response[pb.GetBlockResponse], error) {
    var attrs []attribute.KeyValue
    switch identifier := req.Msg.Identifier.(type) {
    case *pb.GetBlockRequest_Height:
        attrs = append(attrs, attribute.Int64("height", int64(identifier.Height)))
    case *pb.GetBlockRequest_Hash:
        if identifier.Hash != nil {
            attrs = append(attrs, attribute.String("hash", hex.EncodeToString(identifier.Hash)))
        }
    }

    ctx, span := t.tracer.Start(ctx, "StoreService.GetBlock", trace.WithAttributes(attrs...))
    defer span.End()

    res, err := t.inner.GetBlock(ctx, req)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }

    span.SetAttributes(
        attribute.Bool("found", res.Msg.Block != nil),
    )
    if res.Msg.Block != nil && res.Msg.Block.Data != nil {
        totalSize := 0
        for _, tx := range res.Msg.Block.Data.Txs {
            totalSize += len(tx)
        }
        span.SetAttributes(
            attribute.Int("block_size_bytes", totalSize),
            attribute.Int("tx_count", len(res.Msg.Block.Data.Txs)),
        )
    }
    return res, nil
}

Fix this →


4. Code Duplication Across All Traced Methods (Gemini Finding)

Location: pkg/rpc/server/tracing.go (multiple locations)
Issue: The error handling pattern is repeated 10+ times:

res, err := t.inner.Method(ctx, req)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
    return nil, err
}

Why this matters: This violates DRY principle and makes the code harder to maintain. If error handling needs to change (e.g., adding error codes to attributes), it requires updating 10+ locations.

Recommendation: Extract a generic helper function using Go generics:

func traceUnaryRPC[Req, Res any](
    ctx context.Context,
    tracer trace.Tracer,
    spanName string,
    attrs []attribute.KeyValue,
    innerFn func(context.Context) (*connect.Response[Res], error),
    successAttrsFn func(*connect.Response[Res]) []attribute.KeyValue,
) (*connect.Response[Res], error) {
    ctx, span := tracer.Start(ctx, spanName, trace.WithAttributes(attrs...))
    defer span.End()

    res, err := innerFn(ctx)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }

    if successAttrsFn != nil {
        span.SetAttributes(successAttrsFn(res)...)
    }

    return res, nil
}

Then each traced method becomes much simpler:

func (t *tracedStoreServer) GetState(
    ctx context.Context,
    req *connect.Request[emptypb.Empty],
) (*connect.Response[pb.GetStateResponse], error) {
    return traceUnaryRPC(
        ctx, t.tracer, "StoreService.GetState", nil,
        func(ctx context.Context) (*connect.Response[pb.GetStateResponse], error) {
            return t.inner.GetState(ctx, req)
        },
        func(res *connect.Response[pb.GetStateResponse]) []attribute.KeyValue {
            if res.Msg.State == nil {
                return nil
            }
            return []attribute.KeyValue{
                attribute.Int64("height", int64(res.Msg.State.LastBlockHeight)),
                attribute.String("app_hash", hex.EncodeToString(res.Msg.State.AppHash)),
                attribute.Int64("da_height", int64(res.Msg.State.DaHeight)),
            }
        },
    )
}

Fix this →


Medium Priority Issues

5. Potential Privacy Leak in Namespace Tracing

Location: pkg/rpc/server/tracing.go:252-256
Issue: The GetNamespace method traces header and data namespaces as span attributes. Depending on your threat model, namespace values might be considered sensitive operational information.

Recommendation: Consider if namespace values should be redacted in traces. If they're sensitive, use a hash or omit them entirely:

span.SetAttributes(
    attribute.Bool("header_namespace_present", res.Msg.HeaderNamespace != ""),
    attribute.Bool("data_namespace_present", res.Msg.DataNamespace != ""),
)

6. Potential Sensitive Data Exposure in Signer Address

Location: pkg/rpc/server/tracing.go:273-276
Issue: The signer address is exported to traces as a hex string. While this is public blockchain data, it directly identifies the operator's validator/sequencer identity.

Recommendation: Consider if this should be omitted or hashed in traces:

// Option 1: Hash the address
hashedAddr := sha256.Sum256(res.Msg.Address)
span.SetAttributes(
    attribute.String("signer_address_hash", hex.EncodeToString(hashedAddr[:8])),
)

// Option 2: Just indicate presence
span.SetAttributes(
    attribute.Bool("has_signer", len(res.Msg.Address) > 0),
)

7. Missing Hash Query Tracing in Tests

Location: pkg/rpc/server/tracing_test.go:142-180
Issue: TestTracedStoreService_GetBlock_Success only tests height-based queries. Hash-based queries have a different code path (lines 52-54 in tracing.go) that's untested.

Recommendation: Add a test for hash-based queries:

func TestTracedStoreService_GetBlock_ByHash_Success(t *testing.T) {
    testHash := []byte{0xaa, 0xbb, 0xcc, 0xdd}
    mock := &mockStoreServiceHandler{
        getBlockFn: func(ctx context.Context, req *connect.Request[pb.GetBlockRequest]) (*connect.Response[pb.GetBlockResponse], error) {
            return connect.NewResponse(&pb.GetBlockResponse{
                Block: &pb.Block{
                    Header: &pb.SignedHeader{
                        Header: &pb.Header{Height: 10},
                    },
                    Data: &pb.Data{Txs: [][]byte{[]byte("tx1")}},
                },
            }), nil
        },
    }
    handler, sr := setupStoreTrace(t, mock)
    ctx := context.Background()

    req := connect.NewRequest(&pb.GetBlockRequest{
        Identifier: &pb.GetBlockRequest_Hash{Hash: testHash},
    })

    res, err := handler.GetBlock(ctx, req)
    require.NoError(t, err)
    require.NotNil(t, res)

    spans := sr.Ended()
    require.Len(t, spans, 1)
    span := spans[0]
    
    attrs := span.Attributes()
    requireAttribute(t, attrs, "hash", "aabbccdd")
    requireAttribute(t, attrs, "found", true)
}

Fix this →


Low Priority Issues

8. Inconsistent Attribute Naming Convention

Location: Throughout pkg/rpc/server/tracing.go
Issue: Attribute names mix conventions:

  • Snake case: block_size_bytes, tx_count, peer_count, store_count
  • No prefix: height, found, hash
  • Mix of patterns

Recommendation: Follow OpenTelemetry semantic conventions. For custom attributes, use a consistent prefix:

  • evnode.store.block.height
  • evnode.store.block.size_bytes
  • evnode.store.block.tx_count
  • evnode.p2p.peer.count

This follows the OTel attribute naming guidelines.

9. Missing Context Cancellation Propagation Test

Location: Tests don't verify that context cancellation is properly propagated through the tracing layer.

Recommendation: Add a test that verifies cancelled contexts are handled correctly:

func TestTracedStoreService_ContextCancellation(t *testing.T) {
    mock := &mockStoreServiceHandler{
        getStateFn: func(ctx context.Context, req *connect.Request[emptypb.Empty]) (*connect.Response[pb.GetStateResponse], error) {
            return nil, ctx.Err()
        },
    }
    handler, sr := setupStoreTrace(t, mock)
    
    ctx, cancel := context.WithCancel(context.Background())
    cancel() // Cancel before calling

    req := connect.NewRequest(&emptypb.Empty{})
    _, err := handler.GetState(ctx, req)
    require.Error(t, err)
    require.ErrorIs(t, err, context.Canceled)
    
    spans := sr.Ended()
    require.Len(t, spans, 1)
    require.Equal(t, codes.Error, spans[0].Status().Code)
}

10. Missing Documentation for Tracer Names

Location: pkg/rpc/server/tracing.go:28, 177, 234
Issue: The tracer names ("ev-node/store-service", "ev-node/p2p-service", "ev-node/config-service") are hardcoded strings with no documentation about the naming convention.

Recommendation: Add package-level constants with documentation:

const (
    // TracerNameStore is the OpenTelemetry tracer name for Store service operations.
    // Used to create spans for all StoreService RPC methods.
    TracerNameStore = "ev-node/store-service"
    
    // TracerNameP2P is the OpenTelemetry tracer name for P2P service operations.
    TracerNameP2P = "ev-node/p2p-service"
    
    // TracerNameConfig is the OpenTelemetry tracer name for Config service operations.
    TracerNameConfig = "ev-node/config-service"
)

Positive Observations

  1. Excellent test coverage structure - Each service has success and error test cases
  2. Proper span lifecycle - All spans use defer span.End() correctly
  3. Good use of decorator pattern - Clean separation of concerns
  4. Conditional tracing - Well-implemented in server.go:385-389
  5. Comprehensive attribute capture - Block size, transaction count, peer info all captured
  6. Clean mock implementations - Tests are well-structured and maintainable

Summary

Priority Issues Recommendations
Critical 2 Fix missing store attributes, add test coverage
High 2 Refactor GetBlock, eliminate code duplication
Medium 3 Review security considerations, add hash query tests
Low 3 Improve naming conventions, documentation

Overall Assessment: This is a solid tracing implementation that will significantly improve observability. The decorator pattern is well-executed, and the test coverage is comprehensive. The main concerns are:

  1. Missing per-store attributes (regression from bugfix)
  2. Code duplication that could be eliminated with a helper function
  3. Minor efficiency and security considerations

I recommend addressing the critical and high-priority issues before merging, as they impact the diagnostic value of the traces and long-term maintainability.


Tasks

  • Read all changed files
  • Review code quality & architecture
  • Review error handling & resilience
  • Review security considerations
  • Review performance & resource efficiency
  • Review testing & reliability
  • Review documentation
  • Post comprehensive feedback

@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 13, 2026, 3:32 PM

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OpenTelemetry tracing for the Store, P2P, and Config services. The implementation uses a decorator pattern to wrap the existing service handlers, which is a good approach. The changes are well-tested, with a new test file covering the tracing logic. I've identified a few areas for improvement: a bug in attribute key generation, an opportunity to simplify the code in one of the traced methods, a suggestion to reduce code duplication across the tracing decorators, and a gap in the test coverage. Overall, this is a great addition for observability.

)
for i, store := range res.Msg.Stores {
span.SetAttributes(
attribute.Int64("store_"+string(rune(i))+"_height", int64(store.Height)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a bug in how the attribute key is constructed. string(rune(i)) will convert the integer index i into its corresponding Unicode character, not its string representation. For example, for i=0, it will produce a null character (\x00), which is likely not what you intend and may cause issues with tracing backends.

You should use strconv.Itoa(i) to convert the integer to a string. This will require adding an import for the strconv package.

			attribute.Int64("store_"+strconv.Itoa(i)+"_height", int64(store.Height)),

Comment on lines +36 to +54
var height uint64
switch identifier := req.Msg.Identifier.(type) {
case *pb.GetBlockRequest_Height:
height = identifier.Height
case *pb.GetBlockRequest_Hash:
// for hash-based queries, we'll add the hash as an attribute
}

ctx, span := t.tracer.Start(ctx, "StoreService.GetBlock",
trace.WithAttributes(
attribute.Int64("height", int64(height)),
),
)
defer span.End()

// add hash attribute if hash-based query
if hashIdentifier, ok := req.Msg.Identifier.(*pb.GetBlockRequest_Hash); ok {
span.SetAttributes(attribute.String("hash", hex.EncodeToString(hashIdentifier.Hash)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for setting span attributes based on the request identifier can be simplified. Currently, you're using a switch to get the height, and then a separate if to get the hash, which involves checking the type of req.Msg.Identifier twice. This can be done more cleanly in a single switch statement that builds the attributes before starting the span. This will make the code more readable and efficient.

	var attrs []attribute.KeyValue
	switch identifier := req.Msg.Identifier.(type) {
	case *pb.GetBlockRequest_Height:
		attrs = append(attrs, attribute.Int64("height", int64(identifier.Height)))
	case *pb.GetBlockRequest_Hash:
		if identifier.Hash != nil {
			attrs = append(attrs, attribute.String("hash", hex.EncodeToString(identifier.Hash)))
		}
	}

	ctx, span := t.tracer.Start(ctx, "StoreService.GetBlock", trace.WithAttributes(attrs...))
	defer span.End()

Comment on lines +86 to +91
res, err := t.inner.GetState(ctx, req)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for handling errors from the inner service call and updating the span is repeated in almost every traced method in this file. To improve maintainability and reduce boilerplate, consider extracting this logic into a generic helper function.

For example, you could create a function that takes the context, request, the inner function call, and returns the response and error, handling the span creation and error recording internally. Here's a conceptual example:

func traceUnary[Req, Res any](
    tracer trace.Tracer,
    ctx context.Context,
    spanName string,
    req *connect.Request[Req],
    call func(context.Context, *connect.Request[Req]) (*connect.Response[Res], error),
    // ... other params for attributes
) (*connect.Response[Res], error) {
    ctx, span := tracer.Start(ctx, spanName)
    defer span.End()

    res, err := call(ctx, req)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }

    // ... set success attributes

    return res, nil
}

Applying this pattern would make the tracing decorators much more concise.

Comment on lines +314 to +315
attrs := span.Attributes()
requireAttribute(t, attrs, "store_count", 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is missing assertions for the per-store attributes (store_0_height, store_1_height). The current test only checks for store_count. Adding assertions for individual store heights would have caught the attribute key generation bug in GetP2PStoreInfo and would make the test more robust.

	attrs := span.Attributes()
	requireAttribute(t, attrs, "store_count", 2)
	requireAttribute(t, attrs, "store_0_height", int64(100))
	requireAttribute(t, attrs, "store_1_height", int64(99))

@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 82.69231% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.21%. Comparing base (1ff13f3) to head (2edad3e).

Files with missing lines Patch % Lines
pkg/rpc/server/tracing.go 84.56% 17 Missing and 6 partials ⚠️
pkg/rpc/server/server.go 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@                     Coverage Diff                     @@
##           cian/add-tracing-part-5    #2972      +/-   ##
===========================================================
+ Coverage                    57.74%   58.21%   +0.46%     
===========================================================
  Files                           98       99       +1     
  Lines                         9394     9547     +153     
===========================================================
+ Hits                          5425     5558     +133     
- Misses                        3365     3378      +13     
- Partials                       604      611       +7     
Flag Coverage Δ
combined 58.21% <82.69%> (+0.46%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from cian/add-tracing-part-5 to main January 13, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants