Conversation
* Initial commit * Fix formatting * Add across partitions check * Add new test case Add a new test case * Fix buggy test
…#13909) (apache#13934) * Set utf8view as return type when input type is the same * Verify that the returned type from call to scalar function matches the return type specified in the return_type function * Match return type to utf8view Co-authored-by: Tim Saucer <timsaucer@gmail.com>
This reverts commit 5383d30.
* fix: fetch is missed in the EnfoceSorting * fix conflict * resolve comments from alamb * update
…it disabled by default
…e#14415) (apache#14453) * chore: Fixed CI * chore * chore: Fixed clippy * chore Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
* Test for string / numeric coercion * fix tests * Update tests * Add tests to stringview * add numeric coercion
| ======= | ||
| if let Some(comparison) = scalar.partial_cmp(current_best) { | ||
| let is_better = if find_greater { | ||
| comparison == std::cmp::Ordering::Greater | ||
| } else { | ||
| comparison == std::cmp::Ordering::Less | ||
| }; | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
apache#16624 Upstream fix for using try_cmp instead of partial_cmp for scalar values.
| let mut new_plan = AnalyzeExec::new( | ||
| self.verbose, | ||
| self.show_statistics, | ||
| self.metric_types.clone(), |
There was a problem hiding this comment.
The AnalyzeExec::new method now takes metric_types: Vec<MetricType> as the third argument. https://docs.rs/datafusion/51.0.0/datafusion/physical_plan/analyze/struct.AnalyzeExec.html#method.new
| ) -> Result<Option<Arc<dyn ExecutionPlan>>> { | ||
| let mut new_plan = | ||
| ProjectionExec::try_new(self.expr.clone(), Arc::clone(self.input()))?; | ||
| ProjectionExec::try_new(self.expr().to_vec(), Arc::clone(self.input()))?; |
There was a problem hiding this comment.
The first argument now changed to self.expr().to_vec().
https://docs.rs/datafusion/51.0.0/datafusion/physical_plan/projection/struct.ProjectionExec.html#method.expr
| true | ||
| } | ||
|
|
||
| #[allow(deprecated)] |
There was a problem hiding this comment.
Deprecated since 44.0.0: Use UnionExec::try_new instead
https://docs.rs/datafusion/51.0.0/datafusion/physical_plan/union/struct.UnionExec.html#method.try_new
Maybe we can switch to try_new in a follow-on PR?
There was a problem hiding this comment.
Also okay to change it directly in the PR if not a big change
| ======= | ||
| match reassign_predicate_columns(filter, &schema, true) { | ||
| Ok(filter) => { | ||
| match Self::add_filter_equivalence_info( | ||
| filter, | ||
| &mut eq_properties, | ||
| &schema, | ||
| ) { | ||
| Ok(()) => {} | ||
| Err(e) => { | ||
| warn!("Failed to add filter equivalence info: {e}"); | ||
| #[cfg(debug_assertions)] | ||
| panic!("Failed to add filter equivalence info: {e}"); | ||
| } | ||
| } | ||
| } | ||
| >>>>>>> origin/branch-51 |
| ======= | ||
| macro_rules! ignore_dangling_col { | ||
| ($col:expr) => { | ||
| if let Some(col) = $col.as_any().downcast_ref::<Column>() { | ||
| if schema.index_of(col.name()).is_err() { | ||
| continue; | ||
| } | ||
| } | ||
| }; | ||
| } | ||
|
|
||
| let (equal_pairs, _) = collect_columns_from_predicate(&filter); | ||
| for (lhs, rhs) in equal_pairs { | ||
| // Ignore any binary expressions that reference non-existent columns in the current schema | ||
| // (e.g. due to unnecessary projections being removed) | ||
| ignore_dangling_col!(lhs); | ||
| ignore_dangling_col!(rhs); | ||
| eq_properties.add_equal_conditions(Arc::clone(lhs), Arc::clone(rhs))? | ||
| >>>>>>> origin/branch-51 |
| <<<<<<< HEAD | ||
| .with_projection_indices(Some(vec![0, 1, 2])) | ||
| ======= | ||
| .with_projection(Some(vec![0, 1, 2])) | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
The FileScanConfigBuilder::with_projection() method has been deprecated in favor of with_projection_indices()
| <<<<<<< HEAD | ||
| /// Number of row groups whose bloom filters were checked, tracked with matched/pruned counts | ||
| pub row_groups_pruned_bloom_filter: PruningMetrics, | ||
| /// Number of row groups whose statistics were checked, tracked with matched/pruned counts | ||
| pub row_groups_pruned_statistics: PruningMetrics, | ||
| ======= | ||
| /// Number of row groups whose bloom filters were checked and matched (not pruned) | ||
| pub row_groups_matched_bloom_filter: Count, | ||
| /// Number of row groups pruned by bloom filters | ||
| pub row_groups_pruned_bloom_filter: Count, | ||
| /// Number of row groups pruned due to limit pruning. | ||
| pub limit_pruned_row_groups: Count, | ||
| /// Number of row groups whose statistics were checked and fully matched | ||
| pub row_groups_fully_matched_statistics: Count, | ||
| /// Number of row groups whose statistics were checked and matched (not pruned) | ||
| pub row_groups_matched_statistics: Count, | ||
| /// Number of row groups pruned by statistics | ||
| pub row_groups_pruned_statistics: Count, | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
metric type changed to PruningMetrics.
| Self { | ||
| files_ranges_pruned_statistics, | ||
| predicate_evaluation_errors, | ||
| row_groups_matched_bloom_filter, |
There was a problem hiding this comment.
This one row_groups_matched_bloom_filter was missing, so added it back.
| <<<<<<< HEAD | ||
| metrics.row_groups_pruned_statistics.add_matched(1); | ||
| ======= | ||
| fully_contained_candidates_original_idx.push(*idx); | ||
| metrics.row_groups_matched_statistics.add(1); | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
Keeps full_contained_candidates_original_idx + upstream API change to add_matched.
| <<<<<<< HEAD | ||
| ======= | ||
| #[cfg(feature = "parquet_encryption")] | ||
| use datafusion_common::encryption::map_config_decryption_to_decryption; | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
cc @zhuqi-lucas I forgot the discusssion result about the decryption
There was a problem hiding this comment.
This should be safe, we finally remove encryption for default, and we don't use it.
| <<<<<<< HEAD | ||
| truncated_rows__ = | ||
| ======= | ||
|
|
||
| truncated_rows__ = | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
Fixed with proto-common regen.sh script.
There was a problem hiding this comment.
The structs ListingOptions, ListingTable, and ListingTableConfig are now available within the datafusion-catalog-listing crate.
| ======= | ||
| - DataSourceExec: file_groups={2 groups: [[test1.parquet], [test2.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=true, predicate=DynamicFilterPhysicalExpr [ true ] | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
Rendering format changed to DynamicFilter. Keeping upstream.
| //! | ||
| //! // Workaround for `node_id` not being serializable: | ||
| //! let mut annotator = NodeIdAnnotator::new(); | ||
| //! let physical_round_trip = annotate_node_id_for_execution_plan(&physical_round_trip, &mut annotator)?; | ||
| //! |
There was a problem hiding this comment.
This is a new doc test added for round-trip execution plan. The node_id is not currently serialized so had to manually annotate it before assertion.
Is this ok?
There was a problem hiding this comment.
It's ok due to node_id is our internal implementation.
| fn add_merge_on_top(input: DistributionContext) -> DistributionContext { | ||
| /// Updated node with an execution plan, where desired single | ||
| /// distribution is satisfied by adding [`SortPreservingMergeExec`]. | ||
| fn add_merge_on_top( |
There was a problem hiding this comment.
This was previously named add_spm_on_top per DF patch tracking doc.
| ints Map("entries": Struct("key": Utf8, "value": Int64), unsorted) NO | ||
| strings Map("entries": Struct("key": Utf8, "value": Utf8), unsorted) NO | ||
| timestamp Utf8View NO | ||
| timestamp Utf8 NO |
There was a problem hiding this comment.
There are other tests too because of this setting: datafusion.execution.parquet.schema_force_view_types false
| BinaryView 616161 BinaryView 616161 BinaryView 616161 | ||
| BinaryView 626262 BinaryView 626262 BinaryView 626262 | ||
| BinaryView 636363 BinaryView 636363 BinaryView 636363 | ||
| BinaryView 646464 BinaryView 646464 BinaryView 646464 | ||
| BinaryView 656565 BinaryView 656565 BinaryView 656565 | ||
| BinaryView 666666 BinaryView 666666 BinaryView 666666 | ||
| BinaryView 676767 BinaryView 676767 BinaryView 676767 | ||
| BinaryView 686868 BinaryView 686868 BinaryView 686868 | ||
| BinaryView 696969 BinaryView 696969 BinaryView 696969 | ||
| Binary 616161 LargeBinary 616161 BinaryView 616161 | ||
| Binary 626262 LargeBinary 626262 BinaryView 626262 | ||
| Binary 636363 LargeBinary 636363 BinaryView 636363 | ||
| Binary 646464 LargeBinary 646464 BinaryView 646464 | ||
| Binary 656565 LargeBinary 656565 BinaryView 656565 | ||
| Binary 666666 LargeBinary 666666 BinaryView 666666 | ||
| Binary 676767 LargeBinary 676767 BinaryView 676767 | ||
| Binary 686868 LargeBinary 686868 BinaryView 686868 | ||
| Binary 696969 LargeBinary 696969 BinaryView 696969 |
There was a problem hiding this comment.
datafusion.execution.parquet.schema_force_view_types false
| //! | ||
| //! // Workaround for `node_id` not being serializable: | ||
| //! let mut annotator = NodeIdAnnotator::new(); | ||
| //! let physical_round_trip = annotate_node_id_for_execution_plan(&physical_round_trip, &mut annotator)?; | ||
| //! |
There was a problem hiding this comment.
It's ok due to node_id is our internal implementation.
| <<<<<<< HEAD | ||
| ======= | ||
| #[cfg(feature = "parquet_encryption")] | ||
| use datafusion_common::encryption::map_config_decryption_to_decryption; | ||
| >>>>>>> origin/branch-51 |
There was a problem hiding this comment.
This should be safe, we finally remove encryption for default, and we don't use it.
Upgrade Steps
branch-51-upstreamwhich is upstream DF 51 at commit fd35a09.branch-51(notbranch-50) into this PR branch. Last commit in fork is ff301c8 - Add restriction for enabling limit pruning.Verified DF patches (present in this PR)
Missing patches (not yet applied to this PR)
1. make DefaultSchemaAdapter public Commit 848fd57
Reason: Moved out of core to
datafusion-datasource. Needs to be patched once more.Status: ✅ Fixed.
2. Add partition filters' equivalence classes info to the execution plan if it's DataSourceExec Commit c7628fb.
DataSourceExecis missing.add_partition_filter_equivalence_infomethod is missing.Status: ⏳ To be verified later if this is ported upstream. If not, add it back to fork.
Merged upstream
TODO