Skip to content

Conversation

@tausbn
Copy link
Contributor

@tausbn tausbn commented Oct 28, 2025

Extends the parser and libraries to support the new t-string syntax introduced in Python 3.14 (cf. PEP-750)

Due to the complexity of our current handling of f-strings, I opted not to extend the existing f-string support to also handle t-strings. Instead, t-strings are a completely separate (and much simpler) construction.

@tausbn tausbn force-pushed the tausbn/python-add-support-for-template-string-literals branch from 691a54f to b6c5b53 Compare December 4, 2025 13:48
@tausbn tausbn force-pushed the tausbn/python-add-support-for-template-string-literals branch 6 times, most recently from 6d2d0eb to 98279f7 Compare December 4, 2025 21:49
@tausbn tausbn marked this pull request as ready for review December 4, 2025 22:38
@tausbn tausbn requested review from a team as code owners December 4, 2025 22:38
Copilot AI review requested due to automatic review settings December 4, 2025 22:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for Python 3.14's template string literals (t-strings) as defined in PEP-750. The implementation introduces new AST nodes (TemplateString, JoinedTemplateString, TemplateStringPart) to represent template strings, taking a simpler approach than the existing f-string support.

Key Changes:

  • Extended the Tree-sitter scanner and grammar to recognize the 't'/'T' prefix for template strings
  • Added new database schema entities for template string AST nodes with proper upgrade/downgrade paths
  • Created new QL classes to represent template strings in the library

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/ql/lib/upgrades/acf8d3b08ae3cfac8833d16efbfa5a10fef86819/* Database upgrade files for adding template string support
python/ql/lib/semmlecode.python.dbscheme* Schema updates defining new template string AST nodes
python/ql/lib/semmle/python/AstGenerated.qll Generated QL classes for template string nodes
python/ql/lib/semmle/python/AstExtended.qll Extended classes for template string functionality
python/extractor/tsg-python/tsp/src/scanner.cc Scanner logic to recognize t-string prefix and handle interpolation
python/extractor/tsg-python/tsp/grammar.js Grammar rules for template string syntax
python/extractor/tsg-python/src/main.rs Rust extractor updates for handling t-string prefixes
python/extractor/semmle/python/*.py Python extractor updates for template string AST nodes
python/extractor/tests/parser/template_strings_new.* Test cases for template string parsing
python/downgrades/8d257a4a9bc78e39856d6cd33499389fc5148d4f/* Downgrade path for removing template string support
python/ql/lib/change-notes/2025-12-04-support-template-string-literals.md Release note documenting the new feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +7 to +17
t""
if 2:
t"Hello, {name}!"
if 3:
t"Value: {value:.2f}, Hex: {value:#x}"
if 4:
t"Just a regular string."
if 5:
t"Multiple {first} and {second} placeholders."
if 6:
t"Implicit concatenation: " t"Hello, {name}!" t" How are you?"
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax Error (in Python 3).

Suggested change
t""
if 2:
t"Hello, {name}!"
if 3:
t"Value: {value:.2f}, Hex: {value:#x}"
if 4:
t"Just a regular string."
if 5:
t"Multiple {first} and {second} placeholders."
if 6:
t"Implicit concatenation: " t"Hello, {name}!" t" How are you?"
""
if 2:
f"Hello, {name}!"
if 3:
f"Value: {value:.2f}, Hex: {value:#x}"
if 4:
"Just a regular string."
if 5:
f"Multiple {first} and {second} placeholders."
if 6:
"Implicit concatenation: " + f"Hello, {name}!" + " How are you?"

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty interesting edgecase from Copilot. I would have thought it could figure it out based on the PR description. I wonder if there was some overly verbose comments in this file if it would still have said this?

But it's literally in the filename...

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think f-strings had their own delimiter at some point, but then we could get away without it. It seems like it would have been more future-proof to keep it :-)
I am actually somewhat OK with template strings being singled out since they are special: They are more complex objects with accessible fields. But I do feel a little bit that we have with this PR classified the existing code as tech-debt, and we should make the grammar more symmetric at some point...

Comment on lines +2079 to +2080
attr (@part.node) s = safe_string
attr (@part.node) text = safe_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it really be the same in both? I can see in the test that they end up with different values (quoted or not) but I do not see where that happens..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between s and text comes from here:

if key =="s" and value[0] == '"': # e.g. `s: "k1.k2"`
value = evaluate_string(value)

(where the value in s gets evaulated, whereas text does not, hence the diffence in values).

We're doing the same thing here as we do for f-strings, so I think it's correct (if a bit concerning):

attr (@part.node) s = safe_string
attr (@part.node) text = safe_string

}
Prefix {
flags: flags.to_lowercase().to_owned(),
flags: flags.to_owned(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we do not depend on this elsewhere :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of our code should handle upper and lowercase prefixing the same (that is, we do not actually depend on the case being normalised).

- Extends the scanner with a new token kind representing the start of a
template string. This is used to distinguish template strings from
regular strings (because only a template string will start with a
`_template_string_start` external token).

- Cleans up the logic surrounding interpolations (and the method names)
so that format strings and template strings behave the same in this
case.

Finally, we add two new node types in the tree-sitter grammar:

- `template_string` behaves like format strings, but is a distinct type
(mainly so that an implicit concatenation between template strings and
regular strings becomes a syntax error).
- `concatenated_template_string` is the counterpart of
`concatenated_string`.

However, internally, the string parts of a template strings are just the
same `string_content` nodes that are used in regular format strings. We
will disambiguate these inside `tsg-python`.
Adds three new AST nodes to the mix:

- `TemplateString` represents a t-string in Python 3.14
- `TemplateStringPart` represents one of the string constituents of a
t-string. (The interpolated expressions are represented as `Expr` nodes,
just like f-strings.)
- `JoinedTemplateString` represents an implicit concatenation of
template strings.

Importantly, we _completely avoid_ the complicated construction we
currently do for format strings (as well as the confusing nomenclature).
No extra injection of empty strings (so that a template string is a
strict alternation of strings and expressions). A `JoinedTemplateString`
simply has a list of template string children, and a `TemplateString`
has a list of "values" which may be either `Expr` or
`TemplateStringPart` nodes.

If we ever find that we actually want the more complicated interface for
these strings, then I would much rather we reconstruct this inside of QL
rather than in the parser.
We do the usual thing. Downgrade scripts remove the relevant relations;
upgrade scripts do nothing.
Not actually based on any measurements, just the usual 100/1000 stuff.
@tausbn tausbn force-pushed the tausbn/python-add-support-for-template-string-literals branch from a35fba1 to 4d45b58 Compare December 16, 2025 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants