Skip to content

Comments

Session 3 progress: Tasks #4-#8 complete#19

Merged
cbb330 merged 1 commit intomainfrom
session-3-progress
Feb 20, 2026
Merged

Session 3 progress: Tasks #4-#8 complete#19
cbb330 merged 1 commit intomainfrom
session-3-progress

Conversation

@cbb330
Copy link
Owner

@cbb330 cbb330 commented Feb 20, 2026

Session 3 progress update documenting completion of Tasks #4-#8.

@cbb330 cbb330 merged commit a6e1250 into main Feb 20, 2026
2 of 4 checks passed
@cbb330 cbb330 deleted the session-3-progress branch February 20, 2026 22:36
cbb330 added a commit that referenced this pull request Feb 20, 2026
cbb330 added a commit that referenced this pull request Feb 20, 2026
- Remove duplicate physical_schema_mutex_ declaration from OrcFileFragment
- The mutex is inherited from Fragment base class (util::Mutex)
- Fix all lock usages to use Lock() method instead of std::lock_guard
- ORC code was incorrectly declaring std::mutex, shadowing base class member
- Now matches Parquet's thread safety pattern exactly

Verified: All 10 mutex usages now use 'auto lock = physical_schema_mutex_.Lock()'

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
cbb330 added a commit that referenced this pull request Feb 20, 2026
cbb330 added a commit that referenced this pull request Feb 20, 2026
Implemented OrcTestFileGenerator helper class with three key methods:
- MakeMultiStripeFile: Creates files with controlled stripe boundaries and value ranges
- MakeFileWithNullStripe: Creates files with an all-null stripe for testing null handling
- MakeFileWithSingleValueStripe: Creates files where one stripe has min=max (constant values)

All methods use small stripe_size (1KB) to force one stripe per batch, enabling
precise control over stripe boundaries and statistics.

Also added comprehensive unit tests verifying:
- Correct number of stripes generated
- Values in expected ranges per stripe
- Null handling in all-null stripes
- Single-value stripe generation

This infrastructure is required for testing predicate pushdown functionality
in subsequent tasks.

Verified: Code structure follows Parquet test patterns
cbb330 added a commit that referenced this pull request Feb 20, 2026
cbb330 added a commit that referenced this pull request Feb 20, 2026
Implements thread safety test for concurrent scans on the same fragment.
Validates that caching mechanisms (metadata, manifest, statistics) are
properly protected by mutexes.

Test implementation:
- Creates 8-stripe ORC file with 100 rows per stripe
- Builds scanner with threading enabled (UseThreads(true))
- Adds projection (x + 10) to test parallel compute operations
- Uses ScanBatchesUnorderedAsync for concurrent fragment reads
- Verifies all stripes are read (8 batches)
- Verifies total row count (800 rows)
- Verifies projection correctness (values in range 10-809)

This test exercises:
- Concurrent metadata loading
- Thread-safe manifest caching
- Parallel stripe reads
- Concurrent statistics access
- Compute operations on multiple threads

Critical for validating Task #19's mutex protection implementation.

Mirrors ParquetFileFormat::MultithreadedScan test pattern.

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant