Handle blank BLAT results, and fix BLAT results for some targets by sallybg · Pull Request #51 · VariantEffect/dcd_mapping2

sallybg · 2025-07-17T18:38:24Z

Previously, blank BLAT results were not caught immediately after alignment, resulting in an opaque key error.
Catch blank BLAT results for any target and return the error to the client.
Some blank BLAT results were previously due to poor alignment of codon-optimized nucleotide target sequences. This occurs when a target sequence was codon-optimized for a non-human organism and then provided as a nucleotide sequence. If no nucleotide-level variants are reported for such a target, translate the target to an amino acid sequence immediately after score set metadata ingestion, and use this sequence as the target sequence for the mapping job.

If a target has only protein-level variants, but the provided target sequence is a nucleotide sequence, translate the nucleotide sequence to an amino acid sequence immediately after metadata ingestion. This change avoids alignment errors that can occur when a target sequence has been codon-optimized to a non-human organism. Since we do not have sufficient metadata to assume that a target sequence has been codon-optimized, always perform translation when there are no nucleotide-level variants for a target.

bencap

Thanks Sally, code looks good. Left one minor comment that doesn't matter too much.

Question on how this fits into the broader mapping routine: I think we had discussed that this target sequence correction would only happen if the alignment to DNA failed. Is that still true in these changes, or am I misremembering the conversation?

bencap · 2025-07-17T19:58:56Z

src/dcd_mapping/mavedb_data.py

    return _load_scoreset_records(scores_csv, metadata)


+def correct_target_sequence_type(


Pretty minor comment, but maybe we can name this patch_target_sequence_type? I think correct has an implication this change is persistent.

Maybe we should keep it though because it's persistent in the context of the mapper. If you think that argument is the relevant scope, then it seems reasonable to keep it as correct.

I agree that patch is the better term here!

sallybg · 2025-07-18T16:51:23Z

@bencap You are right that we originally discussed only patching the target sequence type if BLAT failed! After that discussion, Alan suggested that we could patch it in every case where there are no nucleotide level variants, because we would eventually translate the target sequence anyway and return the protein sequence in our metadata, and pre-mapped objects would be against the translated target sequence. So this doesn't impact the output directly, although it does change what sequence is used for BLAT. I don't think it's too risky to use the protein sequence for BLAT, and it allows us to save a little time by not attempting a BLAT that will fail. It also is easier with the code structure to adjust it right away instead of within the aligner, because 1. we would need to pass variant records to the align function just for this purpose, and 2. we align all targets at once, so we would need to pull out any failing targets, create a new file for them, and then merge them with the previous BLAT results for ones that worked.
What do you think?
Also, that reminds me that I should bump the version number for this change, so that we'll know which sequence was used for BLAT for score sets that fall into this category.

bencap · 2025-07-18T19:44:00Z

Gotcha, yeah that all makes sense. Yeah let's merge this into the dev branch and then we can also merge the other changes I'm working on into there, bump the version number, and deploy these next week.

This update changes how alignment is performed for some score sets, so bump major version.

sallybg added 2 commits July 17, 2025 11:28

Raise AlignmentError if no alignment result for target

6dc2781

sallybg requested a review from bencap July 17, 2025 18:38

bencap approved these changes Jul 17, 2025

View reviewed changes

bencap changed the base branch from mavedb-main to mavedb-dev July 18, 2025 19:44

sallybg added 2 commits July 18, 2025 13:58

Change function name to patch_target_sequence_type

01427c4

Bump version number

d60e81a

This update changes how alignment is performed for some score sets, so bump major version.

sallybg merged commit 340b0f1 into mavedb-dev Jul 18, 2025
6 checks passed

bencap mentioned this pull request Aug 8, 2025

Release 2025.2.0 #56

Merged

bencap deleted the fix_key_errors_after_mapping branch February 6, 2026 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle blank BLAT results, and fix BLAT results for some targets#51

Handle blank BLAT results, and fix BLAT results for some targets#51
sallybg merged 4 commits intomavedb-devfrom
fix_key_errors_after_mapping

sallybg commented Jul 17, 2025

Uh oh!

bencap left a comment

Uh oh!

bencap Jul 17, 2025 •

edited

Loading

Uh oh!

sallybg Jul 18, 2025

Uh oh!

sallybg commented Jul 18, 2025

Uh oh!

bencap commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return _load_scoreset_records(scores_csv, metadata)


		def correct_target_sequence_type(

Conversation

sallybg commented Jul 17, 2025

Uh oh!

bencap left a comment

Choose a reason for hiding this comment

Uh oh!

bencap Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sallybg Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

sallybg commented Jul 18, 2025

Uh oh!

bencap commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bencap Jul 17, 2025 •

edited

Loading