Fix OpenQASM 3 exporter to properly escape invalid identifiers #15498

ddri · 2026-01-04T09:47:25Z

Summary

The OpenQASM 3 exporter was producing invalid identifiers in two cases:

Register names starting with ASCII digits (e.g., 3qr → should be escaped)
Names containing Unicode number characters (e.g., j² → should be escaped)

The root cause was the regex [\w] which matches digits, so names like 3qr were incorrectly considered valid.

Solution

Replaced the regex-based validation with proper Unicode-aware functions using unicodedata.category() to correctly identify valid identifier characters per the OpenQASM 3 spec:

First character: Unicode letter (category L*) or underscore
Subsequent: Unicode letters, underscores, or ASCII digits 0-9

This also simplifies the escaping by only replacing invalid characters rather than always prepending an underscore.

Examples

Input	Before (invalid)	After (valid)
`3qr`	`3qr`	`_qr`
`j²`	`j²`	`j_`
`t[0]`	`_t_0_`	`t_0_`

Test plan

Added 4 new tests for identifier escaping
All 119 existing QASM3 export tests pass
Manual verification of escaping behavior

The exporter was producing invalid OpenQASM 3 identifiers in two cases: 1. Register names starting with ASCII digits (e.g., "3qr") 2. Names containing Unicode number characters (e.g., "j²") Replaced the regex-based validation with proper Unicode-aware functions that use unicodedata to correctly identify valid identifier characters per the OpenQASM 3 spec: - First character: Unicode letter (category L*) or underscore - Subsequent: Unicode letters, underscores, or ASCII digits 0-9 This also simplifies the escaping logic by only replacing invalid characters rather than always prepending an underscore. Fixes Qiskit#15304, fixes Qiskit#15303

qiskit-bot · 2026-01-04T09:47:31Z

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the following people are relevant to this code:

@Qiskit/terra-core

debasmita2102 · 2026-01-05T05:32:34Z

qiskit/qasm3/exporter.py

+def _is_valid_identifier(name: str) -> bool:
+    """Check if a name is a valid OpenQASM 3 identifier.
+
+    Per the OpenQASM 3 spec, identifiers must:
+    - Start with a Unicode letter (category L*) or underscore
+    - Contain only Unicode letters, underscores, or ASCII digits (0-9)
+
+    This excludes Unicode digit/number characters (categories Nd, Nl, No) from
+    the first position, and excludes non-ASCII digit characters (Nl, No) from
+    all positions.
+    """
+    if not name:
+        return False
+    first = name[0]
+    # First char must be letter (L*) or underscore, not any kind of number
+    first_cat = unicodedata.category(first)
+    if not (first_cat.startswith("L") or first == "_"):
+        return False
+    # Rest can be letters, underscore, or ASCII digits 0-9
+    for char in name[1:]:
+        if char in "0123456789" or char == "_":
+            continue
+        cat = unicodedata.category(char)
+        if not cat.startswith("L"):
+            return False
+    return True


Current code uses startswith('L'). Could you confirm this is the intended behavior, or should we be more explicit (e.g. cat in ('Lu', 'Ll', 'Lt', 'Lm', 'Lo'))

jakelishman · 2026-01-05T12:45:39Z

Hiya - thanks for the PR, but #15305 was already open and just needed a final merge for a while now (it got a little lost apparently), which also addresses #15303 and #15304 too. This proposed PR allows more identifiers without escaping, but at the cost of manual Python-space iteration through every character of identifies and checking the unicode database, which I'm worried about the cost to performance of. I'd potentially like to keep things simpler/faster on the export than trying to be "perfect" - if we try and involve too much Unicode, we end up in nasty situations where we have to make decisions about identifier normalisation etc, whereas restricting our export to a much simpler character set avoids all that and lets us be a bit faster.

ddri requested a review from a team as a code owner January 4, 2026 09:47

qiskit-bot added the Community PR PRs from contributors that are not 'members' of the Qiskit repo label Jan 4, 2026

This was referenced Jan 4, 2026

OpenQASM 3 exporter does not escape digits at the start of identifiers #15304

Open

OpenQASM 3 export does not escape unusual Unicode digits #15303

Open

debasmita2102 reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix OpenQASM 3 exporter to properly escape invalid identifiers #15498

Fix OpenQASM 3 exporter to properly escape invalid identifiers #15498

ddri commented Jan 4, 2026

Uh oh!

qiskit-bot commented Jan 4, 2026

Uh oh!

debasmita2102 Jan 5, 2026

Uh oh!

jakelishman commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix OpenQASM 3 exporter to properly escape invalid identifiers #15498

Are you sure you want to change the base?

Fix OpenQASM 3 exporter to properly escape invalid identifiers #15498

Conversation

ddri commented Jan 4, 2026

Summary

Solution

Examples

Test plan

Uh oh!

qiskit-bot commented Jan 4, 2026

Uh oh!

debasmita2102 Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

jakelishman commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants