fix: fix colbert query postprocessing by joein · Pull Request #557 · qdrant/fastembed

joein · 2025-08-28T09:56:35Z

We've introduced a bug in #547

Due to the replacement of return to yield, after yielding embeddings in not is_doc branch instead of leaving the function we continue its execution and additionally return embeddings with document post-processing.

coderabbitai · 2025-08-28T10:01:46Z

📝 Walkthrough

Walkthrough

The change adjusts Colbert._post_process_onnx_output in fastembed/late_interaction/colbert.py to separate document and non-document flows. When is_doc is False, outputs are yielded immediately without document-specific post-processing. When is_doc is True, input_ids and attention_mask are validated, the attention mask is applied to embeddings, features are normalized, and masked embeddings are yielded. No public API signatures were changed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

batch inference should return same shape as individual inference #547 — Modifies the same Colbert._post_process_onnx_output flow (document vs non-document handling and yielding behavior).
new: preserve embeddings in a type set by their model #492 — Also edits Colbert._post_process_onnx_output, addressing dtype conversion/return behavior in the same method.

Suggested reviewers

hh-space-invader
I8dNLo

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-colbert-query-postprocess

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

fastembed/late_interaction/colbert.py (1)

57-61: Vectorize mask construction to remove the nested Python loops.
Current O(B*L) Python loops will be a hotspot on long sequences/batches. Build a boolean mask with numpy instead.

Apply this diff to replace the loop:

-            for i, token_sequence in enumerate(output.input_ids):
-                for j, token_id in enumerate(token_sequence):  # type: ignore
-                    if token_id in self.skip_list or token_id == self.pad_token_id:
-                        output.attention_mask[i, j] = 0
+            # Build boolean mask of tokens to drop (skip-list and pad), shape: (B, L)
+            skip_ids = (
+                np.fromiter(self.skip_list, dtype=output.input_ids.dtype)
+                if self.skip_list
+                else np.array([], dtype=output.input_ids.dtype)
+            )
+            skip_mask = np.isin(output.input_ids, skip_ids) if skip_ids.size else np.zeros_like(output.input_ids, dtype=bool)
+            pad_mask = (output.input_ids == self.pad_token_id) if (self.pad_token_id is not None) else np.zeros_like(output.input_ids, dtype=bool)
+            # Keep only tokens where original attention_mask==1 and not in skip/pad
+            keep_mask = output.attention_mask.astype(bool, copy=False) & ~(skip_mask | pad_mask)
+            # Replace attention_mask with boolean keep mask for downstream steps
+            output.attention_mask = keep_mask

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8872392 and b8128dc.

📒 Files selected for processing (1)

fastembed/late_interaction/colbert.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: Python 3.13.x on windows-latest test
GitHub Check: Python 3.13.x on ubuntu-latest test
GitHub Check: Python 3.12.x on ubuntu-latest test
GitHub Check: Python 3.11.x on macos-latest test
GitHub Check: Python 3.10.x on macos-latest test
GitHub Check: Python 3.10.x on ubuntu-latest test
GitHub Check: Python 3.12.x on macos-latest test
GitHub Check: Python 3.12.x on windows-latest test
GitHub Check: Python 3.13.x on macos-latest test
GitHub Check: Python 3.9.x on macos-latest test
GitHub Check: Python 3.11.x on windows-latest test
GitHub Check: Python 3.9.x on ubuntu-latest test
GitHub Check: Python 3.11.x on ubuntu-latest test
GitHub Check: Python 3.9.x on windows-latest test
GitHub Check: Python 3.10.x on windows-latest test

🔇 Additional comments (2)

fastembed/late_interaction/colbert.py (2)

51-56: Early-exit bug is correctly fixed; doc-only validation is now gated.
The control flow now prevents query paths from running doc-specific checks and processing. Looks good.

62-66: Resolve: in-place mutation of OnnxOutputContext fields is safe. All uses of model_output and attention_mask occur immediately within each _post_process_onnx_output implementation on freshly created context instances, and no downstream code retains or reuses the original arrays.

coderabbitai · 2025-08-28T10:01:49Z

fastembed/late_interaction/colbert.py

+            output.model_output *= np.expand_dims(output.attention_mask, 2)
+            norm = np.linalg.norm(output.model_output, ord=2, axis=2, keepdims=True)
+            norm_clamped = np.maximum(norm, 1e-12)
+            output.model_output /= norm_clamped
+
+            for embedding, attention_mask in zip(output.model_output, output.attention_mask):
+                yield embedding[attention_mask == 1]


🛠️ Refactor suggestion

Avoid dtype upcasts and reuse the boolean mask for both zeroing and yielding.
Multiplying float embeddings by an int64 mask risks dtype promotions and extra work. Use a boolean mask cast to the model_output dtype, and use dtype-aware epsilon.

Apply this diff:

- output.model_output *= np.expand_dims(output.attention_mask, 2) - norm = np.linalg.norm(output.model_output, ord=2, axis=2, keepdims=True) - norm_clamped = np.maximum(norm, 1e-12) - output.model_output /= norm_clamped - - for embedding, attention_mask in zip(output.model_output, output.attention_mask): - yield embedding[attention_mask == 1] + # Zero out masked tokens without changing dtype + mask_f = output.attention_mask[..., None].astype(output.model_output.dtype, copy=False) + output.model_output *= mask_f + # Normalize with dtype-appropriate epsilon + norm = np.linalg.norm(output.model_output, ord=2, axis=2, keepdims=True) + eps = np.finfo(output.model_output.dtype).eps + output.model_output /= np.maximum(norm, eps) + + # Yield only kept tokens + for embedding, keep_mask in zip(output.model_output, output.attention_mask): + yield embedding[keep_mask]

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In fastembed/late_interaction/colbert.py around lines 62 to 68, avoid integer->float dtype upcasts and redundant mask comparisons: convert attention_mask once to a boolean array, cast that boolean mask to the model_output dtype for in-place masking, and use a dtype-matching epsilon when clamping norms; then reuse the boolean mask for yielding rows instead of comparing to 1. Concretely: create mask_bool = output.attention_mask.astype(bool), mask_cast = mask_bool.astype(output.model_output.dtype), multiply model_output by mask_cast in-place, compute norm and norm_clamped using eps = np.array(1e-12, dtype=output.model_output.dtype), divide by norm_clamped, and in the final loop use for embedding, mask in zip(output.model_output, mask_bool): yield embedding[mask].

tbung · 2025-08-28T10:21:22Z

fastembed/late_interaction/colbert.py

        if not is_doc:
            for embedding in output.model_output:
                yield embedding
-


Nit: Could also be fixed by adding an empty return here (outside loop but inside if), that would make the code less nested.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/test_late_interaction_embeddings.py (1)
222-224: Add multi-query regression test to prevent future double-yields.

Also assert cardinality for multiple queries so a regression can’t slip by when batching queries.

Apply this addition:
+@pytest.mark.parametrize("model_name", ["answerdotai/answerai-colbert-small-v1"])
+def test_query_embed_emits_one_per_query(model_name: str):
+    is_ci = os.getenv("CI")
+    model = LateInteractionTextEmbedding(model_name=model_name)
+    queries = ["q1", "q2"]
+    results = list(model.query_embed(queries))
+    assert len(results) == len(queries)
+    if is_ci:
+        delete_model_cache(model.model._model_dir)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b8128dc and c274e9b.

📒 Files selected for processing (1)

tests/test_late_interaction_embeddings.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/test_late_interaction_embeddings.py (2)

fastembed/late_interaction/colbert.py (2)

embed (209-243)

query_embed (245-255)

fastembed/late_interaction/late_interaction_text_embedding.py (2)

embed (117-139)

query_embed (141-153)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: Python 3.13.x on windows-latest test
GitHub Check: Python 3.13.x on macos-latest test
GitHub Check: Python 3.12.x on windows-latest test
GitHub Check: Python 3.13.x on ubuntu-latest test
GitHub Check: Python 3.10.x on windows-latest test
GitHub Check: Python 3.12.x on ubuntu-latest test
GitHub Check: Python 3.12.x on macos-latest test
GitHub Check: Python 3.11.x on macos-latest test
GitHub Check: Python 3.9.x on ubuntu-latest test
GitHub Check: Python 3.11.x on ubuntu-latest test
GitHub Check: Python 3.11.x on windows-latest test
GitHub Check: Python 3.9.x on windows-latest test
GitHub Check: Python 3.10.x on ubuntu-latest test
GitHub Check: Python 3.10.x on macos-latest test
GitHub Check: Python 3.9.x on macos-latest test

🔇 Additional comments (3)

tests/test_late_interaction_embeddings.py (3)

178-178: LGTM — formatting only.

Inline list is fine and keeps the test concise.

199-201: Good regression guard for single-doc path.

Materializing the generator and asserting a single output directly detects accidental double-yields from the embed path.

222-224: Correctly verifies single result for query path.

This catches the original bug (extra yield after query post-processing).

fix: fix colbert query postprocessing

b8128dc

joein requested review from generall and tbung August 28, 2025 09:56

coderabbitai bot reviewed Aug 28, 2025

View reviewed changes

tbung approved these changes Aug 28, 2025

View reviewed changes

fix: improve colbert single embedding tests

c274e9b

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

joein merged commit 4dc76e3 into main Aug 29, 2025
23 checks passed

joein deleted the fix-colbert-query-postprocess branch August 29, 2025 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix colbert query postprocessing#557

fix: fix colbert query postprocessing#557
joein merged 2 commits intomainfrom
fix-colbert-query-postprocess

joein commented Aug 28, 2025

Uh oh!

coderabbitai bot commented Aug 28, 2025 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 28, 2025

Uh oh!

tbung Aug 28, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joein commented Aug 28, 2025

Uh oh!

coderabbitai bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

tbung Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Aug 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)