Topic: "speech-processing"
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language: Python - Size: 98.2 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 9,952 - Forks: 1,504

pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language: Jupyter Notebook - Size: 252 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 7,671 - Forks: 889

pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Size: 459 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 6,516 - Forks: 885

snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language: Python - Size: 100 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 6,032 - Forks: 574

microsoft/torchscale
Foundation Architecture for (M)LLMs
Language: Python - Size: 361 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 3,087 - Forks: 223

linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language: Python - Size: 4.49 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 2,460 - Forks: 189

r9y9/wavenet_vocoder
WaveNet vocoder
Language: Python - Size: 19.7 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2,356 - Forks: 498

r9y9/deepvoice3_pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Language: Python - Size: 6.78 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,980 - Forks: 489

resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
Language: Python - Size: 23.4 KB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 1,850 - Forks: 217

wq2012/awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Size: 81.1 KB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 1,761 - Forks: 231

DigitalPhonetics/IMS-Toucan
Controllable and fast Text-to-Speech for over 7000 languages!
Language: Python - Size: 21.4 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1,617 - Forks: 184

coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Size: 139 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1,318 - Forks: 142

haoheliu/voicefixer
General Speech Restoration
Language: Python - Size: 3.76 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 1,178 - Forks: 142

mravanelli/SincNet
SincNet is a neural architecture for efficiently processing raw audio samples.
Language: Python - Size: 78.9 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 1,178 - Forks: 265

midas-research/audino
Open source audio annotation tool for humans
Language: JavaScript - Size: 12.5 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1,094 - Forks: 134

ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language: Python - Size: 18.2 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 1,078 - Forks: 81

X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
Language: Python - Size: 169 MB - Last synced at: about 2 hours ago - Pushed at: 20 days ago - Stars: 844 - Forks: 85

Ryuk17/SpeechAlgorithms
You can find the speech algorithms you want here
Language: C - Size: 63.9 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 793 - Forks: 248

nanahou/Awesome-Speech-Enhancement
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
Language: MATLAB - Size: 25.2 MB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 775 - Forks: 153

nyrahealth/CrisperWhisper
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Language: Python - Size: 8.31 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 724 - Forks: 36

drethage/speech-denoising-wavenet
A neural network for end-to-end speech denoising
Language: Python - Size: 57.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 694 - Forks: 163

TEN-framework/ten-vad
Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight
Language: C - Size: 9.79 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 627 - Forks: 59

huawei-noah/Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Language: Jupyter Notebook - Size: 33.8 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 583 - Forks: 125

Audio-WestlakeU/FullSubNet
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Language: Python - Size: 892 KB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 552 - Forks: 156

ddlBoJack/Speech-Resources
语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
Size: 5.44 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 550 - Forks: 68

pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Language: HTML - Size: 49.9 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 541 - Forks: 80

breizhn/DTLN
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
Language: Python - Size: 25.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 501 - Forks: 143

SuperKogito/spafe
:sound: spafe: Simplified Python Audio Features Extraction
Language: Python - Size: 20.7 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 475 - Forks: 79

arjo129/uSpeech 📦
Speech recognition toolkit for the arduino
Language: C++ - Size: 482 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 474 - Forks: 102

microsoft/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Language: Python - Size: 72.4 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 464 - Forks: 74

gemengtju/Tutorial_Separation
This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
Language: MATLAB - Size: 74.6 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 459 - Forks: 95

r9y9/pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Language: Python - Size: 15.3 MB - Last synced at: 11 days ago - Pushed at: 12 months ago - Stars: 442 - Forks: 78

santi-pdp/pase
Problem Agnostic Speech Encoder
Language: Python - Size: 10.2 MB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 439 - Forks: 87

novoic/surfboard 📦
Novoic's audio feature extraction library
Language: Python - Size: 598 KB - Last synced at: 30 days ago - Pushed at: over 3 years ago - Stars: 436 - Forks: 47

SforAiDl/Neural-Voice-Cloning-With-Few-Samples 📦
This repository has implementation for "Neural Voice Cloning With Few Samples"
Language: Python - Size: 42.3 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 415 - Forks: 121

r9y9/nnmnkwii
Library to build speech synthesis systems designed for easy and fast prototyping.
Language: Python - Size: 79.7 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 397 - Forks: 73

speechbrain/speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Language: HTML - Size: 46.8 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 368 - Forks: 30

Yuan-ManX/audio-development-tools
This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.
Size: 2.18 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 346 - Forks: 24

NVIDIA/CleanUNet
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
Language: Python - Size: 35.2 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 56

seanwood/gcc-nmf
Real-time GCC-NMF Blind Speech Separation and Enhancement
Language: Python - Size: 43.2 MB - Last synced at: about 2 months ago - Pushed at: about 6 years ago - Stars: 319 - Forks: 134

rishikksh20/VocGAN
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Language: Python - Size: 187 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 318 - Forks: 60

haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
Language: Python - Size: 43.9 KB - Last synced at: 8 months ago - Pushed at: almost 5 years ago - Stars: 315 - Forks: 58

kahne/NonAutoregGenProgress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 303 - Forks: 31

gtreshchev/RuntimeSpeechRecognizer 📦
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
Language: C++ - Size: 24.8 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 297 - Forks: 47

fgnt/pb_bss
Collection of EM algorithms for blind source separation of audio signals
Language: Python - Size: 635 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 286 - Forks: 61

haoheliu/voicefixer_main
General Speech Restoration
Language: Python - Size: 21.5 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 278 - Forks: 56

haoxiangsnr/Wave-U-Net-for-Speech-Enhancement
Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
Language: Python - Size: 511 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 278 - Forks: 64

zycv/awesome-keyword-spotting
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
Size: 129 KB - Last synced at: 12 days ago - Pushed at: about 3 years ago - Stars: 260 - Forks: 40

r9y9/ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Language: Jupyter Notebook - Size: 50.5 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 259 - Forks: 39

AkojimaSLP/Beamforming-for-speech-enhancement
simple delaysum, MVDR and CGMM-MVDR
Language: Python - Size: 3.18 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 257 - Forks: 74

Sharad24/Neural-Voice-Cloning-with-Few-Samples 📦
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Language: Python - Size: 57.7 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 252 - Forks: 55

swasun/VQ-VAE-Speech 📦
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Language: Python - Size: 82.4 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 247 - Forks: 52

sp-nitech/SPTK
A suite of speech signal processing tools
Language: C++ - Size: 5.68 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 234 - Forks: 27

tomchang25/whisper-auto-transcribe
Auto transcribe tool based on whisper
Language: Python - Size: 169 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 220 - Forks: 15

gionanide/Speech_Signal_Processing_and_Classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Language: Python - Size: 827 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 220 - Forks: 62

kahne/SpeechTransProgress
Tracking the progress in end-to-end speech translation
Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 218 - Forks: 26

xmindflow/Awesome_Mamba
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis
Size: 133 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 213 - Forks: 14

jtkim-kaist/Speech-enhancement
Deep neural network based speech enhancement toolkit
Language: MATLAB - Size: 187 MB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 213 - Forks: 62

innFactory/react-native-dialogflow
A React-Native Bridge for the Google Dialogflow (API.AI) SDK
Language: JavaScript - Size: 1.16 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 204 - Forks: 64

rishikksh20/hifigan-denoiser
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Language: Python - Size: 6 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 192 - Forks: 43

cvqluu/TDNN
Time delay neural network (TDNN) implementation in Pytorch using unfold method
Language: Python - Size: 708 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 183 - Forks: 40

dqqcasia/awesome-speech-translation Fork of ucaslyc/speech_translation-papers
Size: 296 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 178 - Forks: 1

SuyashMore/MevonAI-Speech-Emotion-Recognition
Identify the emotion of multiple speakers in an Audio Segment
Language: C - Size: 63.6 MB - Last synced at: 30 days ago - Pushed at: over 2 years ago - Stars: 171 - Forks: 47

ASR-project/Multilingual-PR
Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021) and WavLM (2022) pretrained on a corpus of English speech that we will use in various ways to perform phoneme recognition for different languages with a network trained with Connectionist Temporal Classification (CTC) algorithm.
Language: Python - Size: 3.47 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 171 - Forks: 13

sekiguchi92/SoundSourceSeparation
The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.
Language: Python - Size: 31.6 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 170 - Forks: 30

Voice-Lab/VoiceLab
Automated Reproducible Acoustical Analysis
Language: Python - Size: 16.5 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 152 - Forks: 19

MycroftAI/ZZZ-RETIRED__openstt 📦
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Size: 26.4 KB - Last synced at: 7 days ago - Pushed at: over 9 years ago - Stars: 142 - Forks: 11

jefflai108/pytorch-kaldi-neural-speaker-embeddings
A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
Language: Perl - Size: 9.35 MB - Last synced at: 7 months ago - Pushed at: over 5 years ago - Stars: 137 - Forks: 34

ahkarami/Great-Deep-Learning-Books
A Great Collection of Deep Learning (e)Books
Size: 600 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 135 - Forks: 30

NICEElevateAI/ElevateAIJavaSDK
Java SDK for ElevateAI
Language: Java - Size: 67.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 121 - Forks: 0

albertaparicio/tfg-voice-conversion
Deep Learning-based Voice Conversion system
Language: Python - Size: 3.25 GB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 120 - Forks: 39

NICEElevateAI/ElevateAIDotNetSDK
.Net core 6 SDK for ElevateAI
Language: C# - Size: 934 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 115 - Forks: 0

rishikksh20/SoundStorm-pytorch
Google's SoundStorm: Efficient Parallel Audio Generation
Language: Python - Size: 269 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 114 - Forks: 12

NICEElevateAI/ElevateAIPythonSDK
ElevateAI - Speech-to-text API Python SDK
Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 111 - Forks: 0

huckiyang/QuantumSpeech-QCNN
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
Language: Jupyter Notebook - Size: 859 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 101 - Forks: 20

ga642381/SpeechPrompt
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm
Language: Python - Size: 49.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 99 - Forks: 8

Speech-Interaction-Technology-Aalto-U/itsp
Introduction to Speech Processing
Language: Jupyter Notebook - Size: 254 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 97 - Forks: 16

atosystem/SpeechCLIP
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
Language: Python - Size: 999 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 5

mikeroyal/NLP-Guide
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
Language: Python - Size: 315 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 93 - Forks: 15

Appen/UHV-OTS-Speech 📦
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Language: Forth - Size: 1.41 GB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 92 - Forks: 15

haoheliu/torchsubband
Pytorch implementation of subband decomposition
Language: HTML - Size: 374 KB - Last synced at: 14 days ago - Pushed at: almost 3 years ago - Stars: 92 - Forks: 13

abikaki/awesome-speech-emotion-recognition
😎 Awesome lists about Speech Emotion Recognition
Size: 6.03 MB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 91 - Forks: 6

shangeth/wavencoder
WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.
Language: Python - Size: 5.21 MB - Last synced at: 22 days ago - Pushed at: about 4 years ago - Stars: 91 - Forks: 14

r9y9/SPTK
A modified version of Speech Signal Processing Toolkit (SPTK)
Language: C - Size: 4.31 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 89 - Forks: 18

vocalpy/vak
A neural network framework for researchers studying acoustic communication
Language: Python - Size: 196 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 84 - Forks: 17

Lhx94As/Awesome-Spoken-Language-Identification
An awesome spoken LID repository. (Working in progress
Language: Python - Size: 959 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 83 - Forks: 10

NickWilkinson37/voxseg
A python library for voice activity detection (VAD) for speech/non-speech segmentation.
Language: Python - Size: 98.1 MB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 83 - Forks: 12

alshell7/vokaturi-android
Emotion recognition by speech in android.
Language: C - Size: 2.01 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 82 - Forks: 18

vipchengrui/traditional-speech-enhancement
Spectral Subtraction, Wiener Filtering, MMSE
Language: MATLAB - Size: 39.8 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 81 - Forks: 34

FlorianKrey/DNC
Discriminative Neural Clustering for Speaker Diarisation
Language: Python - Size: 3.62 GB - Last synced at: 9 days ago - Pushed at: about 3 years ago - Stars: 78 - Forks: 14

stevenhillis/awesome-asr-contextualization
A curated list of awesome papers on contextualizing E2E ASR outputs
Size: 59.6 KB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 77 - Forks: 9

ga642381/SpeechGen
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
Size: 141 KB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 74 - Forks: 5

alessandroragano/scoreq
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
Language: Python - Size: 1.46 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 73 - Forks: 4

mwv/vad
Voice Activity Detector
Language: Python - Size: 24.4 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 73 - Forks: 13

huckiyang/Voice2Series-Reprogramming
ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification
Language: TypeScript - Size: 194 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 72 - Forks: 12

kahne/fastwer
A PyPI package for fast word/character error rate (WER/CER) calculation
Language: Python - Size: 432 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 72 - Forks: 16

grausof/keras-sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Language: Python - Size: 260 KB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 72 - Forks: 26

nguyennpa412/vietnamese-speech-to-text-wavenet
Vietnamese speech recognition using Wavenet
Language: Python - Size: 52.9 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 71 - Forks: 36

SIP-Lab/CNN-VAD
A Convolutional Neural Network based Voice Activity Detector for Smartphones
Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 71 - Forks: 22

inevolin/DiscordEarsBot
A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.
Language: JavaScript - Size: 38.6 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 70 - Forks: 351
