Skip to content

Handle blank BLAT results, and fix BLAT results for some targets#51

Merged
sallybg merged 4 commits intomavedb-devfrom
fix_key_errors_after_mapping
Jul 18, 2025
Merged

Handle blank BLAT results, and fix BLAT results for some targets#51
sallybg merged 4 commits intomavedb-devfrom
fix_key_errors_after_mapping

Conversation

@sallybg
Copy link
Copy Markdown
Collaborator

@sallybg sallybg commented Jul 17, 2025

Previously, blank BLAT results were not caught immediately after alignment, resulting in an opaque key error.
Catch blank BLAT results for any target and return the error to the client.
Some blank BLAT results were previously due to poor alignment of codon-optimized nucleotide target sequences. This occurs when a target sequence was codon-optimized for a non-human organism and then provided as a nucleotide sequence. If no nucleotide-level variants are reported for such a target, translate the target to an amino acid sequence immediately after score set metadata ingestion, and use this sequence as the target sequence for the mapping job.

sallybg added 2 commits July 17, 2025 11:28
If a target has only protein-level variants, but the provided target sequence
is a nucleotide sequence, translate the nucleotide sequence to an amino acid
sequence immediately after metadata ingestion.
This change avoids alignment errors that can occur when a target sequence has been
codon-optimized to a non-human organism. Since we do not have sufficient metadata
to assume that a target sequence has been codon-optimized, always perform translation
when there are no nucleotide-level variants for a target.
@sallybg sallybg requested a review from bencap July 17, 2025 18:38
Copy link
Copy Markdown
Collaborator

@bencap bencap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sally, code looks good. Left one minor comment that doesn't matter too much.

Question on how this fits into the broader mapping routine: I think we had discussed that this target sequence correction would only happen if the alignment to DNA failed. Is that still true in these changes, or am I misremembering the conversation?

return _load_scoreset_records(scores_csv, metadata)


def correct_target_sequence_type(
Copy link
Copy Markdown
Collaborator

@bencap bencap Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty minor comment, but maybe we can name this patch_target_sequence_type? I think correct has an implication this change is persistent.

Maybe we should keep it though because it's persistent in the context of the mapper. If you think that argument is the relevant scope, then it seems reasonable to keep it as correct.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that patch is the better term here!

@sallybg
Copy link
Copy Markdown
Collaborator Author

sallybg commented Jul 18, 2025

@bencap You are right that we originally discussed only patching the target sequence type if BLAT failed! After that discussion, Alan suggested that we could patch it in every case where there are no nucleotide level variants, because we would eventually translate the target sequence anyway and return the protein sequence in our metadata, and pre-mapped objects would be against the translated target sequence. So this doesn't impact the output directly, although it does change what sequence is used for BLAT. I don't think it's too risky to use the protein sequence for BLAT, and it allows us to save a little time by not attempting a BLAT that will fail. It also is easier with the code structure to adjust it right away instead of within the aligner, because 1. we would need to pass variant records to the align function just for this purpose, and 2. we align all targets at once, so we would need to pull out any failing targets, create a new file for them, and then merge them with the previous BLAT results for ones that worked.
What do you think?
Also, that reminds me that I should bump the version number for this change, so that we'll know which sequence was used for BLAT for score sets that fall into this category.

@bencap
Copy link
Copy Markdown
Collaborator

bencap commented Jul 18, 2025

Gotcha, yeah that all makes sense. Yeah let's merge this into the dev branch and then we can also merge the other changes I'm working on into there, bump the version number, and deploy these next week.

@bencap bencap changed the base branch from mavedb-main to mavedb-dev July 18, 2025 19:44
sallybg added 2 commits July 18, 2025 13:58
This update changes how alignment is performed for some score sets,
so bump major version.
@sallybg sallybg merged commit 340b0f1 into mavedb-dev Jul 18, 2025
6 checks passed
@bencap bencap mentioned this pull request Aug 8, 2025
@bencap bencap deleted the fix_key_errors_after_mapping branch February 6, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants