Skip to content

add: Add exception and optional skipping for recursive protobuf messages#107

Merged
0x26res merged 5 commits intotradewelltech:masterfrom
david-gonzalez-oxb:add-raise-for-cycles-and-pruning
Oct 27, 2025
Merged

add: Add exception and optional skipping for recursive protobuf messages#107
0x26res merged 5 commits intotradewelltech:masterfrom
david-gonzalez-oxb:add-raise-for-cycles-and-pruning

Conversation

@david-gonzalez-oxb
Copy link
Collaborator

@david-gonzalez-oxb david-gonzalez-oxb commented Oct 21, 2025

This PR concerns the parsing of protobuf messages with recursive cycles in its schema definitions, which unfortunately do exist in the wild. The current behaviour of the package with such messages is to go on an infinite recursion loop until the eventual RecursionError: maximum recursion depth exceeded crash.

With the proposed changes, we leverage the current structure of the code to detect cycles by maintaining a trace of the nested messages in the protobuf schema. Upon encountering a cycle (i.e. a message seen twice on the trace) we raise a self-explanatory TypeError, showing the trace that caused it.

Additionally, we introduce a configuration flag skip_recursive_messages (set to False by default) that enables us to avoid the TypeError by simply skipping the second occurrence of the cyclical message. The pruning is enabled only for plain messages, and not for repeated messages or maps. The handling of those cases could be introduced on a subsequent PR.

@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1cfa1c8) to head (ddacc27).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #107   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          781       797   +16     
  Branches       141       145    +4     
=========================================
+ Hits           781       797   +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@0x26res 0x26res left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this PR. It looks good, I've put a lot of comments but they mainly relate to details.

We just need to be careful about naming.

PS: I think the Security Analysis is failing, I'll update the dependency in master and it should work once you rebase.

@david-gonzalez-oxb david-gonzalez-oxb force-pushed the add-raise-for-cycles-and-pruning branch from 84d04ff to 2d35f27 Compare October 24, 2025 18:32
@david-gonzalez-oxb
Copy link
Collaborator Author

Hello, Thanks for the prompt and detailed review. I've learnt a few things!

I've addressed all the comments from the first review. I've also expanded the changes to the functions message_type_to_schema and message_type_to_struct_type so that all the public API functions are consistent.

I will comment on the open "conversations" if there is anything else to add - otherwise I'll leave it up to you to resolve the conversations once you've gone over the changes.

Thanks!

@david-gonzalez-oxb david-gonzalez-oxb changed the title add: Add detection and pruning of cyclical plain messages. add: Add exception and optional skipping for recursive protobuf messages Oct 24, 2025
Copy link
Collaborator

@0x26res 0x26res left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Only a few minor details and we can merge. I'll then create a new version. Thanks again.

@david-gonzalez-oxb
Copy link
Collaborator Author

I think I have covered everything now. I noticed that the test test_recursive_self_referential_map_message_handling was arbitrarily failing on some python builds. This was due to comparing tables with pyarrow MapArray in them - it should be resolved now.

Hopefully not connected to this PR, I had the test test_extractor[config12-SuperNestedExampleMessage] fail locally once. I couldn't replicate it after a couple of tries so didn't look further into it.

E           assert nested_example_message {\n  repeated_example_message {\n    double_value: -0.938582031093522\n    float_value: -0.1874189...   nanos: 898107605\n      }\n    }\n    optional_double_value: -0.63017964808599047\n    optional_bool_value: false\n  }\n}\n == nested_example_message {\n  repeated_example_message {\n    double_value: -0.938582031093522\n    float_value: -0.1874189...   nanos: 898107605\n      }\n    }\n    optional_double_value: -0.63017964808599047\n    optional_bool_value: false\n  }\n}\n

tests/test_conversion.py:441: AssertionError

@0x26res
Copy link
Collaborator

0x26res commented Oct 27, 2025

Hopefully not connected to this PR, I had the test test_extractor[config12-SuperNestedExampleMessage] fail locally once. I couldn't replicate it after a couple of tries so didn't look further into it.

I think there is a flaky test somewhere due to floats arithmetic or something like that.

@0x26res 0x26res merged commit f1194c8 into tradewelltech:master Oct 27, 2025
8 checks passed
@0x26res
Copy link
Collaborator

0x26res commented Oct 27, 2025

Now available in pypi https://pypi.org/project/protarrow/0.14.0/

Thanks again @david-gonzalez-oxb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants