fix: escape underscores when simplifying starts_with#19077
fix: escape underscores when simplifying starts_with#19077alamb merged 2 commits intoapache:mainfrom
starts_with#19077Conversation
bb36713 to
352bbff
Compare
|
@alamb / @andygrove : |
352bbff to
d41adfe
Compare
|
I've reviewed the changes. Looks correct to me. |
30507bd to
c55f53d
Compare
|
Fixed the formatting, and rebased on the latest |
|
Should you replace ‘\’ as well ? ‘%’, ‘_’ and ‘\’ are the 3 special characters within a like expression See 'predicate.rs' |
Agh, but of course ... |
42f63c6 to
f59f039
Compare
|
Backslashes are now also escaped. I also adjusted the SLT's: they were not failing before the fix, because the simplification for |
| ---- | ||
| logical_plan | ||
| 01)Projection: test.column1_utf8 LIKE Utf8("foo\%%") AS c1, test.column1_large_utf8 LIKE LargeUtf8("foo\%%") AS c2, test.column1_utf8view LIKE Utf8View("foo\%%") AS c3, test.column1_utf8 LIKE Utf8("f_o%") AS c4, test.column1_large_utf8 LIKE LargeUtf8("f_o%") AS c5, test.column1_utf8view LIKE Utf8View("f_o%") AS c6 | ||
| 01)Projection: test.column1_utf8 LIKE Utf8("foo\%%") AS c1, test.column1_large_utf8 LIKE LargeUtf8("foo\%%") AS c2, test.column1_utf8view LIKE Utf8View("foo\%%") AS c3, test.column1_utf8 LIKE Utf8("f\_o%") AS c4, test.column1_large_utf8 LIKE LargeUtf8("f\_o%") AS c5, test.column1_utf8view LIKE Utf8View("f\_o%") AS c6 |
There was a problem hiding this comment.
🤦 -- _ matches exactly one character -- so escaping them is the right thing to do
| // 1. 'ja%' (input pattern) | ||
| // 2. 'ja\%' (escape special char '%') | ||
| // 3. 'ja\%%' (add suffix for starts_with) | ||
| // Example: starts_with(col, 'j_a%') -> col LIKE 'j\_a\%%' |
There was a problem hiding this comment.
Example: starts_with(col, 'j\_a%') -> col LIKE 'j\\\_a\%%' ?
There was a problem hiding this comment.
Indeed, I adjusted the steps below it but not this line ...
Fixed now
f59f039 to
dd01c71
Compare
|
fmt test failure is unrelated. Merging up to get fix from main |
|
Thanks again @willemv @pepijnve and @bert-beyondloops |
## Which issue does this PR close? - Closes apache#19076. ## What changes are included in this PR? The `simplify` implementation for `StartsWithFunc` is adjusted to also escape underscores instead of only percent signs. ## Are these changes tested? Yes, new sql logic tests were added testing `starts_with` functions with string literals and patterns that contain an underscore. Similar tests were added for `ends_with`, even though that function has no simplification and is not affected by this bug. If simplification is ever introduced there, escaping the underscore cannot be overlooked then. ## Are there any user-facing changes? No Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> (cherry picked from commit 340a28c)
Co-authored-by: zjregee <zjregee@gmail.com>
## Which issue does this PR close? - Closes apache#19076. ## What changes are included in this PR? The `simplify` implementation for `StartsWithFunc` is adjusted to also escape underscores instead of only percent signs. ## Are these changes tested? Yes, new sql logic tests were added testing `starts_with` functions with string literals and patterns that contain an underscore. Similar tests were added for `ends_with`, even though that function has no simplification and is not affected by this bug. If simplification is ever introduced there, escaping the underscore cannot be overlooked then. ## Are there any user-facing changes? No Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> (cherry picked from commit 340a28c)
…#392) v52 ## Which issue does this PR close? - Closes apache#19076. ## What changes are included in this PR? The `simplify` implementation for `StartsWithFunc` is adjusted to also escape underscores instead of only percent signs. ## Are these changes tested? Yes, new sql logic tests were added testing `starts_with` functions with string literals and patterns that contain an underscore. Similar tests were added for `ends_with`, even though that function has no simplification and is not affected by this bug. If simplification is ever introduced there, escaping the underscore cannot be overlooked then. ## Are there any user-facing changes? No Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> (cherry picked from commit 340a28c) Co-authored-by: Willem Verstraeten <willem.verstraeten@gmail.com>
…#392) v52 --- [Cherry-pick summary: v46→v47] Source commit: b4e6d5f (fix: escape underscores when simplifying starts_with (apache#19077) (#392) v52) Strategy: cherry-picked cleanly Upstream PR: apache#19077 (not in v47) Test coverage: adequate (adds slt tests for starts_with and ends_with with underscore patterns) Tests: cargo nextest run -p datafusion-functions passed
…#392) v52 --- [Cherry-pick summary: v46→v47] Source commit: b4e6d5f (fix: escape underscores when simplifying starts_with (apache#19077) (#392) v52) Strategy: cherry-picked cleanly Upstream PR: apache#19077 (not in v47) Test coverage: adequate (adds slt tests for starts_with and ends_with with underscore patterns) Tests: cargo nextest run -p datafusion-functions passed
…#392) v52 --- [Cherry-pick summary: v46→v47] Source commit: b4e6d5f (fix: escape underscores when simplifying starts_with (apache#19077) (#392) v52) Strategy: cherry-picked cleanly Upstream PR: apache#19077 (not in v47) Test coverage: adequate (adds slt tests for starts_with and ends_with with underscore patterns) Tests: cargo nextest run -p datafusion-functions passed
…#392) v52 --- [Cherry-pick summary: v46→v47] Source commit: b4e6d5f (fix: escape underscores when simplifying starts_with (apache#19077) (#392) v52) Strategy: cherry-picked cleanly Upstream PR: apache#19077 (not in v47) Test coverage: adequate (adds slt tests for starts_with and ends_with with underscore patterns) Tests: cargo nextest run -p datafusion-functions passed
…#392) v52 --- [Cherry-pick summary: v46→v47] Source commit: b4e6d5f (fix: escape underscores when simplifying starts_with (apache#19077) (#392) v52) Strategy: cherry-picked cleanly Upstream PR: apache#19077 (not in v47) Test coverage: adequate (adds slt tests for starts_with and ends_with with underscore patterns) Tests: cargo nextest run -p datafusion-functions passed
Which issue does this PR close?
starts_withfunction are simplified incorrectly when the prefix contains an underscore #19076.What changes are included in this PR?
The
simplifyimplementation forStartsWithFuncis adjusted to also escape underscores instead of only percent signs.Are these changes tested?
Yes, new sql logic tests were added testing
starts_withfunctions with string literals and patterns that contain an underscore.Similar tests were added for
ends_with, even though that function has no simplification and is not affected by this bug. If simplification is ever introduced there, escaping the underscore cannot be overlooked then.Are there any user-facing changes?
No