-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Add support for template string literals #20708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Python: Add support for template string literals #20708
Conversation
691a54f to
b6c5b53
Compare
6d2d0eb to
98279f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for Python 3.14's template string literals (t-strings) as defined in PEP-750. The implementation introduces new AST nodes (TemplateString, JoinedTemplateString, TemplateStringPart) to represent template strings, taking a simpler approach than the existing f-string support.
Key Changes:
- Extended the Tree-sitter scanner and grammar to recognize the 't'/'T' prefix for template strings
- Added new database schema entities for template string AST nodes with proper upgrade/downgrade paths
- Created new QL classes to represent template strings in the library
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
python/ql/lib/upgrades/acf8d3b08ae3cfac8833d16efbfa5a10fef86819/* |
Database upgrade files for adding template string support |
python/ql/lib/semmlecode.python.dbscheme* |
Schema updates defining new template string AST nodes |
python/ql/lib/semmle/python/AstGenerated.qll |
Generated QL classes for template string nodes |
python/ql/lib/semmle/python/AstExtended.qll |
Extended classes for template string functionality |
python/extractor/tsg-python/tsp/src/scanner.cc |
Scanner logic to recognize t-string prefix and handle interpolation |
python/extractor/tsg-python/tsp/grammar.js |
Grammar rules for template string syntax |
python/extractor/tsg-python/src/main.rs |
Rust extractor updates for handling t-string prefixes |
python/extractor/semmle/python/*.py |
Python extractor updates for template string AST nodes |
python/extractor/tests/parser/template_strings_new.* |
Test cases for template string parsing |
python/downgrades/8d257a4a9bc78e39856d6cd33499389fc5148d4f/* |
Downgrade path for removing template string support |
python/ql/lib/change-notes/2025-12-04-support-template-string-literals.md |
Release note documenting the new feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| t"" | ||
| if 2: | ||
| t"Hello, {name}!" | ||
| if 3: | ||
| t"Value: {value:.2f}, Hex: {value:#x}" | ||
| if 4: | ||
| t"Just a regular string." | ||
| if 5: | ||
| t"Multiple {first} and {second} placeholders." | ||
| if 6: | ||
| t"Implicit concatenation: " t"Hello, {name}!" t" How are you?" |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax Error (in Python 3).
| t"" | |
| if 2: | |
| t"Hello, {name}!" | |
| if 3: | |
| t"Value: {value:.2f}, Hex: {value:#x}" | |
| if 4: | |
| t"Just a regular string." | |
| if 5: | |
| t"Multiple {first} and {second} placeholders." | |
| if 6: | |
| t"Implicit concatenation: " t"Hello, {name}!" t" How are you?" | |
| "" | |
| if 2: | |
| f"Hello, {name}!" | |
| if 3: | |
| f"Value: {value:.2f}, Hex: {value:#x}" | |
| if 4: | |
| "Just a regular string." | |
| if 5: | |
| f"Multiple {first} and {second} placeholders." | |
| if 6: | |
| "Implicit concatenation: " + f"Hello, {name}!" + " How are you?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty interesting edgecase from Copilot. I would have thought it could figure it out based on the PR description. I wonder if there was some overly verbose comments in this file if it would still have said this?
But it's literally in the filename...
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think f-strings had their own delimiter at some point, but then we could get away without it. It seems like it would have been more future-proof to keep it :-)
I am actually somewhat OK with template strings being singled out since they are special: They are more complex objects with accessible fields. But I do feel a little bit that we have with this PR classified the existing code as tech-debt, and we should make the grammar more symmetric at some point...
| attr (@part.node) s = safe_string | ||
| attr (@part.node) text = safe_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it really be the same in both? I can see in the test that they end up with different values (quoted or not) but I do not see where that happens..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between s and text comes from here:
codeql/python/extractor/semmle/python/parser/tsg_parser.py
Lines 148 to 149 in 63329b4
| if key =="s" and value[0] == '"': # e.g. `s: "k1.k2"` | |
| value = evaluate_string(value) |
(where the value in
s gets evaulated, whereas text does not, hence the diffence in values).
We're doing the same thing here as we do for f-strings, so I think it's correct (if a bit concerning):
codeql/python/extractor/tsg-python/python.tsg
Lines 1906 to 1907 in 63329b4
| attr (@part.node) s = safe_string | |
| attr (@part.node) text = safe_string |
| } | ||
| Prefix { | ||
| flags: flags.to_lowercase().to_owned(), | ||
| flags: flags.to_owned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we do not depend on this elsewhere :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our code should handle upper and lowercase prefixing the same (that is, we do not actually depend on the case being normalised).
- Extends the scanner with a new token kind representing the start of a template string. This is used to distinguish template strings from regular strings (because only a template string will start with a `_template_string_start` external token). - Cleans up the logic surrounding interpolations (and the method names) so that format strings and template strings behave the same in this case. Finally, we add two new node types in the tree-sitter grammar: - `template_string` behaves like format strings, but is a distinct type (mainly so that an implicit concatenation between template strings and regular strings becomes a syntax error). - `concatenated_template_string` is the counterpart of `concatenated_string`. However, internally, the string parts of a template strings are just the same `string_content` nodes that are used in regular format strings. We will disambiguate these inside `tsg-python`.
Adds three new AST nodes to the mix: - `TemplateString` represents a t-string in Python 3.14 - `TemplateStringPart` represents one of the string constituents of a t-string. (The interpolated expressions are represented as `Expr` nodes, just like f-strings.) - `JoinedTemplateString` represents an implicit concatenation of template strings. Importantly, we _completely avoid_ the complicated construction we currently do for format strings (as well as the confusing nomenclature). No extra injection of empty strings (so that a template string is a strict alternation of strings and expressions). A `JoinedTemplateString` simply has a list of template string children, and a `TemplateString` has a list of "values" which may be either `Expr` or `TemplateStringPart` nodes. If we ever find that we actually want the more complicated interface for these strings, then I would much rather we reconstruct this inside of QL rather than in the parser.
We do the usual thing. Downgrade scripts remove the relevant relations; upgrade scripts do nothing.
Not actually based on any measurements, just the usual 100/1000 stuff.
a35fba1 to
4d45b58
Compare
Extends the parser and libraries to support the new t-string syntax introduced in Python 3.14 (cf. PEP-750)
Due to the complexity of our current handling of f-strings, I opted not to extend the existing f-string support to also handle t-strings. Instead, t-strings are a completely separate (and much simpler) construction.