add an example of using DataFrame to create a subquery#5961
add an example of using DataFrame to create a subquery#5961alamb merged 3 commits intoapache:mainfrom
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @jiangzhx -- I do not think the challenge of making subqueries is only felt by you -- the focus has been on handling subqueries from SQL I think rather than with DataFrame.
I have some ideas how to make this better. I'll see what I can do
| ctx.table("t1") | ||
| .await? | ||
| .filter( | ||
| Expr::ScalarSubquery(datafusion_expr::Subquery { |
There was a problem hiding this comment.
I agree this is 🤮 -- let me see if I can come up with some way to make this easier to construct
| ), | ||
| outer_ref_columns: vec![], | ||
| }) | ||
| .gt(lit(ScalarValue::UInt8(Some(0)))), |
There was a problem hiding this comment.
| .gt(lit(ScalarValue::UInt8(Some(0)))), | |
| .gt(lit(0u8)), |
alamb
left a comment
There was a problem hiding this comment.
Here is one way we could simplify the example: jiangzhx#179 (a PR into this branch)
I do think there
ctx.table("t1")
.await?
.filter(
exists(Arc::new(
ctx.table("t2")
.await?
.filter(col("t1.c1").eq(col("t2.c1")))?
.aggregate(vec![], vec![avg(col("t2.c2"))])?
.select(vec![avg(col("t2.c2"))])?
.into_unoptimized_plan(),
))
.gt(lit(0u8)),By implementing some traits, I think we could remove the Arc and into_unoptimized_plan call
ctx.table("t1")
.await?
.filter(
exists( ctx.table("t2")
.await?
.filter(col("t1.c1").eq(col("t2.c1")))?
.aggregate(vec![], vec![avg(col("t2.c2"))])?
.select(vec![avg(col("t2.c2"))])?
))
.gt(lit(0u8)),with something like
pub fn scalar_subquery(subquery: impl IntoSubquery) -> Expr {
...
}
/// Something that can be converted into a plan suitable for a subquery
pub trait IntoSubquery {
fn into_subquery(self) -> Arc<LogicalPlan>
}
// and then implement IntoSubqury for `LogicalPlan`, `Arc<LogicalPlan>` and `DataFrame` Though maybe that is getting to complicated 🤔
Simplify expression examples
|
I think we can refine this example further but it is better than what we have at the moment |
|
Thanks again @jiangzhx |
* add an example of using DataFrame to create a subquery * Simplify expression examples --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
it is difficult to write the same logic based on the DataFrame API as SQL.
Therefore, I created this example hoping to help others.
The reason for the difficulty is,
Maybe it's just my personal reason.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?