Task #3: Implement GetOrcColumnIndex function#9
Merged
Conversation
- Implemented GetOrcColumnIndex helper function that: - Resolves FieldRef to ORC column index using manifest - Uses FieldRef.FindOne() to locate field in schema - Traverses manifest tree following field path indices - Handles both top-level and nested fields - Returns column_index for leaf nodes (primitives with statistics) - Returns std::nullopt for containers or not found - Added necessary includes: - <optional> for std::optional return type - arrow/compute/api_scalar.h for FieldRef and FieldPath Implementation details: - Top-level fields accessed via manifest.schema_fields[index] - Nested fields traversed via current_field->children[index] - Validates indices at each level to prevent out-of-bounds - Only returns column_index if field is leaf (has statistics) - Containers (struct/list/map) return nullopt Verified: Manual code review - follows FieldRef resolution pattern Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Merged
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Implemented GetOrcColumnIndex helper function that: - Resolves FieldRef to ORC column index using manifest - Uses FieldRef.FindOne() to locate field in schema - Traverses manifest tree following field path indices - Handles both top-level and nested fields - Returns column_index for leaf nodes (primitives with statistics) - Returns std::nullopt for containers or not found - Added necessary includes: - <optional> for std::optional return type - arrow/compute/api_scalar.h for FieldRef and FieldPath Implementation details: - Top-level fields accessed via manifest.schema_fields[index] - Nested fields traversed via current_field->children[index] - Validates indices at each level to prevent out-of-bounds - Only returns column_index if field is leaf (has statistics) - Containers (struct/list/map) return nullopt Verified: Manual code review - follows FieldRef resolution pattern Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Added PredicateField struct to hold resolved field information - Implemented ResolvePredicateFields() helper function - Resolves field references in predicates to ORC column indices - Uses OrcSchemaManifest for Arrow-to-ORC column mapping - Traverses nested field paths (structs only) - Filters to leaf nodes only (containers don't have statistics) - Type support check (currently int32/int64 only) - Returns vector of PredicateField entities Implementation details: - Uses compute::FieldsInExpression() to extract field refs - Uses FieldRef.FindOneOrNone() for schema matching - Traverses OrcSchemaField tree for nested paths - Validates field indices and struct types - PredicateField includes: field_ref, arrow_field_index, orc_column_index, data_type, supports_statistics Verified: Manual code review following Parquet TestRowGroups pattern (lines 945-960) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Added PredicateField struct to hold resolved field information - Implemented ResolvePredicateFields() helper function - Resolves field references in predicates to ORC column indices - Uses OrcSchemaManifest for Arrow-to-ORC column mapping - Traverses nested field paths (structs only) - Filters to leaf nodes only (containers don't have statistics) - Type support check (currently int32/int64 only) - Returns vector of PredicateField entities Implementation details: - Uses compute::FieldsInExpression() to extract field refs - Uses FieldRef.FindOneOrNone() for schema matching - Traverses OrcSchemaField tree for nested paths - Validates field indices and struct types - PredicateField includes: field_ref, arrow_field_index, orc_column_index, data_type, supports_statistics Verified: Manual code review following Parquet TestRowGroups pattern (lines 945-960) Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Closed
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Adds comprehensive task tracking and progress documentation for the ongoing ORC predicate pushdown implementation project. ## Changes - task_list.json: Complete 35-task breakdown with dependencies - Tasks #0, #0.5, #1, #2 marked as complete (on feature branches) - Tasks #3-apache#35 pending implementation - Organized by phase: Prerequisites, Core, Metadata, Predicate, Scan, Testing, Future - claude-progress.txt: Comprehensive project status document - Codebase structure and build instructions - Work completed on feature branches (not yet merged) - Current main branch state - Next steps and implementation strategy - Parquet mirroring patterns and Allium spec alignment ## Context This is an initialization session to establish baseline tracking for the ORC predicate pushdown project. Previous sessions (1-4) completed initial tasks on feature branches. This consolidates that progress and provides a clear roadmap for future implementation sessions. ## Related Work - Allium spec: orc-predicate-pushdown.allium (already on main) - Feature branches: task-0-statistics-api-v2, task-0.5-stripe-selective-reading, task-1-orc-schema-manifest, task-2-build-orc-schema-manifest (not yet merged) ## Next Steps Future sessions will implement tasks #3+ via individual feature branch PRs.
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Added SupportsStatistics helper to check if type supports statistics pushdown - Created PredicateField struct to hold field resolution information - Implemented ResolvePredicateFields to extract and resolve field references from predicates - Currently supports int32 and int64 types - Skips non-leaf fields and unsupported types - Handles nested field resolution correctly Verified: - Resolves field references using FieldsInExpression - Uses GetOrcColumnIndex for ORC column mapping - Handles nested structs by traversing match indices - Returns comprehensive field information for statistics evaluation
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Merged
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Merged
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement column index resolution function that maps FieldRef to ORC column indices using the schema manifest.
Changes
Implemented
GetOrcColumnIndexhelper function:compute::FieldRefandOrcSchemaManifestFieldRef.FindOne()std::optional<int>with column index or nulloptAdded includes:
<optional>for std::optional return typearrow/compute/api_scalar.hfor FieldRef/FieldPathImplementation Details
Resolution Process:
FieldRef.FindOne()to resolve field in schema →FieldPathmanifest.schema_fieldsis_leaf())column_indexif leaf,nulloptotherwiseReturn Values:
std::optional<int>containing column index for leaf fieldsstd::nulloptif:Examples:
Testing
Task Reference
Completes Task #3 from task_list.json - Core Data Structures phase
Depends on Task #2 (complete)
Enables predicate evaluation tasks
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com