Conversation
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Remove duplicate physical_schema_mutex_ declaration from OrcFileFragment - The mutex is inherited from Fragment base class (util::Mutex) - Fix all lock usages to use Lock() method instead of std::lock_guard - ORC code was incorrectly declaring std::mutex, shadowing base class member - Now matches Parquet's thread safety pattern exactly Verified: All 10 mutex usages now use 'auto lock = physical_schema_mutex_.Lock()' Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Implemented OrcTestFileGenerator helper class with three key methods: - MakeMultiStripeFile: Creates files with controlled stripe boundaries and value ranges - MakeFileWithNullStripe: Creates files with an all-null stripe for testing null handling - MakeFileWithSingleValueStripe: Creates files where one stripe has min=max (constant values) All methods use small stripe_size (1KB) to force one stripe per batch, enabling precise control over stripe boundaries and statistics. Also added comprehensive unit tests verifying: - Correct number of stripes generated - Values in expected ranges per stripe - Null handling in all-null stripes - Single-value stripe generation This infrastructure is required for testing predicate pushdown functionality in subsequent tasks. Verified: Code structure follows Parquet test patterns
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Implements thread safety test for concurrent scans on the same fragment. Validates that caching mechanisms (metadata, manifest, statistics) are properly protected by mutexes. Test implementation: - Creates 8-stripe ORC file with 100 rows per stripe - Builds scanner with threading enabled (UseThreads(true)) - Adds projection (x + 10) to test parallel compute operations - Uses ScanBatchesUnorderedAsync for concurrent fragment reads - Verifies all stripes are read (8 batches) - Verifies total row count (800 rows) - Verifies projection correctness (values in range 10-809) This test exercises: - Concurrent metadata loading - Thread-safe manifest caching - Parallel stripe reads - Concurrent statistics access - Compute operations on multiple threads Critical for validating Task #19's mutex protection implementation. Mirrors ParquetFileFormat::MultithreadedScan test pattern. Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Session 3 progress update documenting completion of Tasks #4-#8.