Add support for first, last aggregate function parsing#880
Add support for first, last aggregate function parsing#880mustafasrepo wants to merge 4 commits intoapache:mainfrom
Conversation
Pull Request Test Coverage Report for Build 4947357822
💛 - Coveralls |
alamb
left a comment
There was a problem hiding this comment.
Thanks @mustafasrepo -- sorry for the delay in reviewing.
| } | ||
| } | ||
|
|
||
| /// An `FIRST` invocation `FIRST( [ DISTINCT ] <expr> [ORDER BY <expr>] [LIMIT <n>] )` |
There was a problem hiding this comment.
I think these new structs follow the existing patterns in this file but they seem very specific to me (as in do we really need two new structs that are almost identical to each other as well as to Function.
Did you consider adding an order_by clause to Function and adjusting the parse_function function?
So something like
pub struct Function {
pub name: ObjectName,
pub args: Vec<FunctionArg>,
pub over: Option<WindowSpec>,
// aggregate functions may specify eg `COUNT(DISTINCT x)`
pub distinct: bool,
// Some functions must be called without trailing parentheses, for example Postgres
// do it for current_catalog, current_schema, etc. This flags is used for formatting.
pub special: bool,
// optional ORDER BY expression <----------- Proposed Addition
pub order_by: Option<Box<OrderByExpr>>,
}I think this would both reduce the code required for this PR and make the codebase easier to maintain, as well as would extend naturally to other aggregate functions (e.g. databricks offers the first_value function in addition to first)
Adding an order_by to Function would allow users to parse any function with an order_by clause which I think would be quite valuable
There was a problem hiding this comment.
I didn't consider, this. However, it seems like a better approach to me. I will experiment with it, then let you know about result. Thanks for the suggestion.
| self.expect_token(&Token::LParen)?; | ||
| let expr = Box::new(self.parse_expr()?); | ||
| // ANSI SQL and BigQuery define ORDER BY inside function. | ||
| if !self.dialect.supports_within_after_array_aggregation() { |
There was a problem hiding this comment.
I don't understand the call to supports_within_after_array_aggregation here as this is not an array aggregate.
|
|
||
| for sql in [ | ||
| "SELECT FIRST(x ORDER BY x) AS a FROM T", | ||
| "SELECT LAST(x ORDER BY x) AS a FROM T", |
There was a problem hiding this comment.
given there is code above to handle WITHIN GROUP during parsing, can you either: 1) Remove that code, or 2) add test coverage for it?
// Snowflake defines ORDERY BY in within group instead of inside the function like
// ANSI SQL.
self.expect_token(&Token::RParen)?;
let within_group = if self.parse_keywords(&[Keyword::WITHIN, Keyword::GROUP]) {
self.expect_token(&Token::LParen)?;
self.expect_keywords(&[Keyword::ORDER, Keyword::BY])?;
let order_by_expr = self.parse_order_by_expr()?;
self.expect_token(&Token::RParen)?;
Some(Box::new(order_by_expr))
} else {
None
};
|
@alamb. I have tried you suggestion. New PR puts order by field inside |
|
I agree #882 looks great and I have merged that one in |
Closes #877
This PR adds support for
FIRST,LASTaggregate functions.As an example, see queries below
or
With this PR we can parse above queries successfully.
Common dialects support this feature, for instance, duckdb (see link), databricks (see link) supports this feature.