Add ASR-EOU models and training/eval scripts#14740
Merged
stevehuang52 merged 142 commits intomainfrom Mar 31, 2026
Merged
Conversation
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
…o end_of_utterance
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
…o end_of_utterance
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Contributor
|
[🤖]: Hi @stevehuang52 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
nithinraok
previously approved these changes
Mar 16, 2026
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: stevehuang52 <stevehuang52@users.noreply.github.com>
…ot defined Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
…ot defined Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
…ot defined Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Contributor
|
[🤖]: Hi @stevehuang52 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
subhankar-ghosh
pushed a commit
that referenced
this pull request
Mar 31, 2026
* initial commit for end-of-utterance detection Signed-off-by: Weiqing Wang <weiqingw@nvidia.com> * change targets to long() type Signed-off-by: Weiqing Wang <weiqingw@nvidia.com> * change output_types() Signed-off-by: Weiqing Wang <weiqingw@nvidia.com> * add random padding and refactor for multiple utterances per sample Signed-off-by: stevehuang52 <heh@nvidia.com> * add handling multiple text groundtruth Signed-off-by: stevehuang52 <heh@nvidia.com> * update and add eval scripts Signed-off-by: stevehuang52 <heh@nvidia.com> * drop sou label and add eob label Signed-off-by: stevehuang52 <heh@nvidia.com> * update hybrid-rnnt-ctc and rnnt models to use eou dataset Signed-off-by: stevehuang52 <heh@nvidia.com> * set default return eou frame label to false Signed-off-by: stevehuang52 <heh@nvidia.com> * handle empty utterance Signed-off-by: stevehuang52 <heh@nvidia.com> * add script for injecting special eou tokens into SPE tokenizer Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor eou eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * add eou rnnt training Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc Signed-off-by: stevehuang52 <heh@nvidia.com> * update data augmentation Signed-off-by: stevehuang52 <heh@nvidia.com> * update data related functions Signed-off-by: stevehuang52 <heh@nvidia.com> * fix tokenizer with eou tokens Signed-off-by: stevehuang52 <heh@nvidia.com> * adding eou force aligner Signed-off-by: Weiqing Wang <weiqingw@nvidia.com> * update for eou Signed-off-by: stevehuang52 <heh@nvidia.com> * fix the case when 'segments_level_ctm_filepath' is not produced Signed-off-by: Weiqing Wang <weiqingw@nvidia.com> * fix force aligner Signed-off-by: stevehuang52 <heh@nvidia.com> * fix aligner Signed-off-by: stevehuang52 <heh@nvidia.com> * update for asr-eou Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up and update infer Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * fix rnnt_decoding for empty string Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update padding augment Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * fix eob metric logging Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor and add hybrid model Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update EOU models Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor percentile calculation Signed-off-by: stevehuang52 <heh@nvidia.com> * update augmentation Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update model and cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update frame eou Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * add adapter to eou Signed-off-by: stevehuang52 <heh@nvidia.com> * remove pdb Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * add cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * fix eou metric Signed-off-by: stevehuang52 <heh@nvidia.com> * update adapter Signed-off-by: stevehuang52 <heh@nvidia.com> * add scripts Signed-off-by: stevehuang52 <heh@nvidia.com> * update docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update generate eval data Signed-off-by: stevehuang52 <heh@nvidia.com> * update eou val Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * add drop_pnc=true as default for dataloading Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * fix miss rate Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * add ignore_eob_label Signed-off-by: stevehuang52 <heh@nvidia.com> * fix and update Signed-off-by: stevehuang52 <heh@nvidia.com> * improve lhotse augmentation Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * add debug info Signed-off-by: stevehuang52 <heh@nvidia.com> * improve data augmentation Signed-off-by: stevehuang52 <heh@nvidia.com> * update utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update dataloader Signed-off-by: stevehuang52 <heh@nvidia.com> * update oomptimizer Signed-off-by: stevehuang52 <heh@nvidia.com> * update oomptimizer Signed-off-by: stevehuang52 <heh@nvidia.com> * update eou model Signed-off-by: stevehuang52 <heh@nvidia.com> * update eou model Signed-off-by: stevehuang52 <heh@nvidia.com> * update eou model Signed-off-by: stevehuang52 <heh@nvidia.com> * update augmentation Signed-off-by: stevehuang52 <heh@nvidia.com> * update aug Signed-off-by: stevehuang52 <heh@nvidia.com> * update augment Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update drop pnc func Signed-off-by: stevehuang52 <heh@nvidia.com> * update eou finetune Signed-off-by: stevehuang52 <heh@nvidia.com> * update transcribe Signed-off-by: stevehuang52 <heh@nvidia.com> * update cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * fix cfg Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up for PR Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * Potential fix for code scanning alert no. 16191: Explicit returns mixed with implicit (fall through) returns Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * Potential fix for code scanning alert no. 16190: Explicit returns mixed with implicit (fall through) returns Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: stevehuang52 <stevehuang52@users.noreply.github.com> * Potential fix for code scanning alert no. 16185: File is not always closed Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * fix pylint&flake8 Signed-off-by: stevehuang52 <heh@nvidia.com> * fix pylint Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * update pr Signed-off-by: stevehuang52 <heh@nvidia.com> * update adapter Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * update readme, test, etc Signed-off-by: He Huang <heh@nvidia.com> * Apply isort and black reformatting Signed-off-by: stevehuang52 <stevehuang52@users.noreply.github.com> * update doc Signed-off-by: He Huang <heh@nvidia.com> * clean up Signed-off-by: He Huang <heh@nvidia.com> * fix and rename Signed-off-by: He Huang <heh@nvidia.com> * update doc Signed-off-by: He Huang <heh@nvidia.com> * clean up Signed-off-by: He Huang <heh@nvidia.com> * move all length aug to invalid Signed-off-by: He Huang <heh@nvidia.com> * fix typo Signed-off-by: He Huang <heh@nvidia.com> * rename and move to scripts/asr_eou Signed-off-by: He Huang <heh@nvidia.com> * fix ci Signed-off-by: He Huang <heh@nvidia.com> * fix ci Signed-off-by: He Huang <heh@nvidia.com> * clean up Signed-off-by: He Huang <heh@nvidia.com> * clean up Signed-off-by: He Huang <heh@nvidia.com> * fix linting Signed-off-by: He Huang <heh@nvidia.com> * fix ci Signed-off-by: He Huang <heh@nvidia.com> * Apply isort and black reformatting Signed-off-by: stevehuang52 <stevehuang52@users.noreply.github.com> * Potential fix for code scanning alert no. 17270: Explicit export is not defined Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * Potential fix for code scanning alert no. 17271: Explicit export is not defined Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * Potential fix for code scanning alert no. 17272: Explicit export is not defined Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> --------- Signed-off-by: Weiqing Wang <weiqingw@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: stevehuang52 <stevehuang52@users.noreply.github.com> Signed-off-by: He Huang <heh@nvidia.com> Co-authored-by: Weiqing Wang <weiqingw@nvidia.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <stevehuang52@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add ASR-EOU models, datasets, training/inference scripts, utilities, etc.
Collection: [asr]
Changelog
EOU model:
nemo/collections/asr/models/asr_eou_models.py: extend RNNT and Hybrid-CTC-RNNT models to ASR-EOU task, by using new dataloader and new metricsEOU dataset:
nemo/collections/asr/data/audio_to_eou_label_lhotse.py: new ASR-EOU datasetEOU examples:
examples/asr/asr_eou: Tutorial, training and testing scriptsexamples/asr/conf/asr_eou: Yaml files for ASR-EOU modelsEOU metrics and utils:
nemo/collections/asr/parts/utils/eou_utils.pyEOU test:
tests/collections/asr/test_asr_eou.py: unit test on EOU metricsEOU auxiliary scripts:
scripts/asr_eou: scripts for data cleaning, creating ASR-EOU tokenizer, noisy eval data creatingtools/nemo_forced_aligner/align_eou.py: modified NFA for injecting EOU timestamps to ASR manifests (@weiqingw4ng)EOU related changes to existing files:
nemo/collections/asr/metrics/wer.py: add the option to configure the decode function to return hypotheses and the feature to get those hypotheses from outside.nemo/collections/asr/modules/rnnt.py: add the parameterkeep_hypothesesto keep the hypotheses of the decoded outputs after forward, so that outside caller can obtain the hypotheses.Bug fixes:
examples/asr/asr_hybrid_transducer_ctc/helpers/convert_nemo_asr_hybrid_to_ctc.py: fix docstringexamples/asr/transcribe_speech.py: fix change att_context_sizenemo/collections/asr/losses/ssl_losses/mlm.py: fix bug when mask is NoneASR improvements:
docker/Dockerfile.speech: update pytorch container to 25.12nemo/collections/asr/modules/conformer_encoder.pyandnemo/collections/asr/modules/ssl_modules/multi_layer_feat.py: merge duplicate code ofConformerMultiLayerFeatureExtractornemo/collections/asr/modules/lstm_decoder.py: add the option to not add blank token to the classifier, so that it can be used as a general frame classifier.nemo/collections/common/data/lhotse/dataloader.py: add resampling noise to the same sample rate as data config.scripts/speech_recognition/convert_to_tarred_audio_dataset.py: add the option to read from multiple filesscripts/speech_recognition/oomptimizer.py: add the option to load mode from local filestools/nemo_forced_aligner/utils/data_prep.py: add support for automatically resolving relative file paths.examples/asr/transcribe_speech_distributed.py: A script that extends thetranscribe_speech.pyto transcribing a huge amount of audios using multiple nodes and GPUs. The script can split large manifests into small ones, and merge them back when all are finished. This avoids the issue intranscribe_speech_parallel.py, which will restart from the beginning if not finished within one cluster job, and the new script also automatically inherits any improvements ontranscribe_speech.py, whiletranscribe_speech_parallel.pyneed separate maintenance. In future we can updatetranscribe_speech_distributed.pyto support loading tarred datasets, so thattranscribe_speech_parallel.pycan be deprecated.