An open API service providing repository metadata for many open source software ecosystems.

Topic: "speech-processing"

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

Language: Python - Size: 98.2 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 10,137 - Forks: 1,527

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language: Jupyter Notebook - Size: 252 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7,671 - Forks: 889

pliang279/awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

Size: 459 KB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 6,551 - Forks: 883

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language: Python - Size: 100 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6,032 - Forks: 574

microsoft/torchscale

Foundation Architecture for (M)LLMs

Language: Python - Size: 361 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 3,098 - Forks: 222

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language: Python - Size: 4.49 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 2,515 - Forks: 193

r9y9/wavenet_vocoder

WaveNet vocoder

Language: Python - Size: 19.7 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 2,356 - Forks: 498

r9y9/deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Language: Python - Size: 6.78 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 1,980 - Forks: 488

resemble-ai/resemble-enhance

AI powered speech denoising and enhancement

Language: Python - Size: 23.4 KB - Last synced at: 1 day ago - Pushed at: 8 months ago - Stars: 1,895 - Forks: 224

wq2012/awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Size: 82 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,774 - Forks: 234

DigitalPhonetics/IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Language: Python - Size: 21.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,617 - Forks: 184

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Size: 139 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 1,318 - Forks: 142

haoheliu/voicefixer

General Speech Restoration

Language: Python - Size: 3.76 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1,178 - Forks: 142

mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

Language: Python - Size: 78.9 MB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 1,178 - Forks: 265

midas-research/audino

Open source audio annotation tool for humans

Language: JavaScript - Size: 12.5 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 1,094 - Forks: 134

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Language: Python - Size: 18.2 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 1,078 - Forks: 81

TEN-framework/ten-vad

Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight

Language: C - Size: 9.55 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 972 - Forks: 88

X-LANCE/SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language: Python - Size: 169 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 854 - Forks: 88

Ryuk17/SpeechAlgorithms

You can find the speech algorithms you want here

Language: C - Size: 63.9 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 793 - Forks: 248

nanahou/Awesome-Speech-Enhancement

A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.

Language: MATLAB - Size: 25.2 MB - Last synced at: 3 days ago - Pushed at: over 4 years ago - Stars: 779 - Forks: 153

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Language: Python - Size: 8.31 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 724 - Forks: 36

drethage/speech-denoising-wavenet

A neural network for end-to-end speech denoising

Language: Python - Size: 57.3 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 699 - Forks: 164

huawei-noah/Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Language: Jupyter Notebook - Size: 33.8 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 583 - Forks: 125

Audio-WestlakeU/FullSubNet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Language: Python - Size: 892 KB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 569 - Forks: 157

ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等,欢迎推荐或自荐

Size: 5.44 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 550 - Forks: 68

pliang279/MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

Language: HTML - Size: 49.9 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 541 - Forks: 80

breizhn/DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

Language: Python - Size: 25.5 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 501 - Forks: 143

SuperKogito/spafe

:sound: spafe: Simplified Python Audio Features Extraction

Language: Python - Size: 20.7 MB - Last synced at: about 1 hour ago - Pushed at: 4 months ago - Stars: 476 - Forks: 79

arjo129/uSpeech 📦

Speech recognition toolkit for the arduino

Language: C++ - Size: 482 KB - Last synced at: 13 days ago - Pushed at: about 4 years ago - Stars: 474 - Forks: 101

microsoft/UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Language: Python - Size: 72.4 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 467 - Forks: 73

gemengtju/Tutorial_Separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Language: MATLAB - Size: 74.6 MB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 459 - Forks: 95

r9y9/pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Language: Python - Size: 15.3 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 445 - Forks: 78

santi-pdp/pase

Problem Agnostic Speech Encoder

Language: Python - Size: 10.2 MB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 439 - Forks: 87

novoic/surfboard 📦

Novoic's audio feature extraction library

Language: Python - Size: 598 KB - Last synced at: 22 days ago - Pushed at: over 3 years ago - Stars: 436 - Forks: 47

SforAiDl/Neural-Voice-Cloning-With-Few-Samples 📦

This repository has implementation for "Neural Voice Cloning With Few Samples"

Language: Python - Size: 42.3 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 415 - Forks: 121

r9y9/nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

Language: Python - Size: 79.7 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 398 - Forks: 73

Yuan-ManX/audio-development-tools

Audio Development Tools (ADT) is a project for advancing sound, speech, and music technologies, featuring components for machine learning, sound synthesis, speech and music generation, signal processing, game audio, digital audio workstations (DAWs), and more.

Size: 904 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 376 - Forks: 26

speechbrain/speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Language: HTML - Size: 46.8 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 368 - Forks: 30

haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement

A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch

Language: Python - Size: 43.9 KB - Last synced at: 16 days ago - Pushed at: almost 5 years ago - Stars: 328 - Forks: 62

NVIDIA/CleanUNet

Official PyTorch Implementation of CleanUNet (ICASSP 2022)

Language: Python - Size: 35.2 KB - Last synced at: about 18 hours ago - Pushed at: almost 2 years ago - Stars: 324 - Forks: 56

seanwood/gcc-nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

Language: Python - Size: 43.2 MB - Last synced at: 2 months ago - Pushed at: over 6 years ago - Stars: 319 - Forks: 134

rishikksh20/VocGAN

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Language: Python - Size: 187 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 318 - Forks: 60

kahne/NonAutoregGenProgress

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 303 - Forks: 31

gtreshchev/RuntimeSpeechRecognizer 📦

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Language: C++ - Size: 24.8 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 298 - Forks: 46

fgnt/pb_bss

Collection of EM algorithms for blind source separation of audio signals

Language: Python - Size: 635 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 286 - Forks: 61

haoheliu/voicefixer_main

General Speech Restoration

Language: Python - Size: 21.5 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 278 - Forks: 56

haoxiangsnr/Wave-U-Net-for-Speech-Enhancement

Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.

Language: Python - Size: 511 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 278 - Forks: 64

zycv/awesome-keyword-spotting

This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).

Size: 129 KB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 266 - Forks: 40

r9y9/ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Language: Jupyter Notebook - Size: 50.5 MB - Last synced at: about 9 hours ago - Pushed at: over 2 years ago - Stars: 261 - Forks: 39

AkojimaSLP/Beamforming-for-speech-enhancement

simple delaysum, MVDR and CGMM-MVDR

Language: Python - Size: 3.18 MB - Last synced at: 4 months ago - Pushed at: over 6 years ago - Stars: 257 - Forks: 74

Sharad24/Neural-Voice-Cloning-with-Few-Samples 📦

Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu

Language: Python - Size: 57.7 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 252 - Forks: 55

swasun/VQ-VAE-Speech 📦

PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]

Language: Python - Size: 82.4 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 247 - Forks: 52

sp-nitech/SPTK

A suite of speech signal processing tools

Language: C++ - Size: 5.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 237 - Forks: 27

tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

Language: Python - Size: 169 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 226 - Forks: 16

gionanide/Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Language: Python - Size: 827 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 220 - Forks: 62

kahne/SpeechTransProgress

Tracking the progress in end-to-end speech translation

Size: 121 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 218 - Forks: 26

xmindflow/Awesome_Mamba

Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis

Size: 133 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 213 - Forks: 14

jtkim-kaist/Speech-enhancement

Deep neural network based speech enhancement toolkit

Language: MATLAB - Size: 187 MB - Last synced at: 4 months ago - Pushed at: about 6 years ago - Stars: 213 - Forks: 62

innFactory/react-native-dialogflow

A React-Native Bridge for the Google Dialogflow (API.AI) SDK

Language: JavaScript - Size: 1.16 MB - Last synced at: about 12 hours ago - Pushed at: about 2 years ago - Stars: 204 - Forks: 64

rishikksh20/hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Language: Python - Size: 6 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 192 - Forks: 43

cvqluu/TDNN

Time delay neural network (TDNN) implementation in Pytorch using unfold method

Language: Python - Size: 708 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 183 - Forks: 40

dqqcasia/awesome-speech-translation Fork of ucaslyc/speech_translation-papers

Size: 296 KB - Last synced at: about 23 hours ago - Pushed at: over 3 years ago - Stars: 178 - Forks: 1

SuyashMore/MevonAI-Speech-Emotion-Recognition

Identify the emotion of multiple speakers in an Audio Segment

Language: C - Size: 63.6 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 171 - Forks: 47

ASR-project/Multilingual-PR

Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021) and WavLM (2022) pretrained on a corpus of English speech that we will use in various ways to perform phoneme recognition for different languages with a network trained with Connectionist Temporal Classification (CTC) algorithm.

Language: Python - Size: 3.47 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 171 - Forks: 13

sekiguchi92/SoundSourceSeparation

The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.

Language: Python - Size: 31.6 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 170 - Forks: 30

Voice-Lab/VoiceLab

Automated Reproducible Acoustical Analysis

Language: Python - Size: 16.5 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 152 - Forks: 19

MycroftAI/ZZZ-RETIRED__openstt 📦

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

Size: 26.4 KB - Last synced at: 3 days ago - Pushed at: over 9 years ago - Stars: 142 - Forks: 11

jefflai108/pytorch-kaldi-neural-speaker-embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.

Language: Perl - Size: 9.35 MB - Last synced at: 10 days ago - Pushed at: over 5 years ago - Stars: 136 - Forks: 34

ahkarami/Great-Deep-Learning-Books

A Great Collection of Deep Learning (e)Books

Size: 600 KB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 135 - Forks: 30

NICEElevateAI/ElevateAIJavaSDK

Java SDK for ElevateAI

Language: Java - Size: 67.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 121 - Forks: 0

albertaparicio/tfg-voice-conversion

Deep Learning-based Voice Conversion system

Language: Python - Size: 3.25 GB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 120 - Forks: 39

NICEElevateAI/ElevateAIDotNetSDK

.Net core 6 SDK for ElevateAI

Language: C# - Size: 934 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 115 - Forks: 0

rishikksh20/SoundStorm-pytorch

Google's SoundStorm: Efficient Parallel Audio Generation

Language: Python - Size: 269 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 114 - Forks: 12

NICEElevateAI/ElevateAIPythonSDK

ElevateAI - Speech-to-text API Python SDK

Language: Python - Size: 43.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 111 - Forks: 0

huckiyang/QuantumSpeech-QCNN

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Language: Jupyter Notebook - Size: 859 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 101 - Forks: 20

ga642381/SpeechPrompt

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Language: Python - Size: 49.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 99 - Forks: 8

kehanlu/DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

Language: HTML - Size: 4.44 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 98 - Forks: 7

mikeroyal/NLP-Guide

Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

Language: Python - Size: 315 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 98 - Forks: 16

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

Language: Jupyter Notebook - Size: 254 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 97 - Forks: 16

atosystem/SpeechCLIP

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022

Language: Python - Size: 999 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 5

Appen/UHV-OTS-Speech 📦

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Language: Forth - Size: 1.41 GB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 92 - Forks: 15

haoheliu/torchsubband

Pytorch implementation of subband decomposition

Language: HTML - Size: 374 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 92 - Forks: 13

abikaki/awesome-speech-emotion-recognition

😎 Awesome lists about Speech Emotion Recognition

Size: 6.03 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 91 - Forks: 6

shangeth/wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Language: Python - Size: 5.21 MB - Last synced at: 13 days ago - Pushed at: about 4 years ago - Stars: 91 - Forks: 14

r9y9/SPTK

A modified version of Speech Signal Processing Toolkit (SPTK)

Language: C - Size: 4.31 MB - Last synced at: about 22 hours ago - Pushed at: about 3 years ago - Stars: 89 - Forks: 18

vocalpy/vak

A neural network framework for researchers studying acoustic communication

Language: Python - Size: 196 MB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 87 - Forks: 17

Lhx94As/Awesome-Spoken-Language-Identification

An awesome spoken LID repository. (Working in progress

Language: Python - Size: 959 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 83 - Forks: 10

NickWilkinson37/voxseg

A python library for voice activity detection (VAD) for speech/non-speech segmentation.

Language: Python - Size: 98.1 MB - Last synced at: 8 months ago - Pushed at: almost 3 years ago - Stars: 83 - Forks: 12

alshell7/vokaturi-android

Emotion recognition by speech in android.

Language: C - Size: 2.01 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 82 - Forks: 18

vipchengrui/traditional-speech-enhancement

Spectral Subtraction, Wiener Filtering, MMSE

Language: MATLAB - Size: 39.8 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 81 - Forks: 34

alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Language: Python - Size: 1.48 MB - Last synced at: about 6 hours ago - Pushed at: about 1 month ago - Stars: 78 - Forks: 4

FlorianKrey/DNC

Discriminative Neural Clustering for Speaker Diarisation

Language: Python - Size: 3.62 GB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 78 - Forks: 14

stevenhillis/awesome-asr-contextualization

A curated list of awesome papers on contextualizing E2E ASR outputs

Size: 59.6 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 77 - Forks: 9

grausof/keras-sincnet

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

Language: Python - Size: 260 KB - Last synced at: 7 days ago - Pushed at: about 4 years ago - Stars: 74 - Forks: 26

ga642381/SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

Size: 141 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 73 - Forks: 5

mwv/vad

Voice Activity Detector

Language: Python - Size: 24.4 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 73 - Forks: 13

huckiyang/Voice2Series-Reprogramming

ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification

Language: TypeScript - Size: 194 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 72 - Forks: 12

kahne/fastwer

A PyPI package for fast word/character error rate (WER/CER) calculation

Language: Python - Size: 432 KB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 72 - Forks: 16

nguyennpa412/vietnamese-speech-to-text-wavenet

Vietnamese speech recognition using Wavenet

Language: Python - Size: 52.9 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 71 - Forks: 36

SIP-Lab/CNN-VAD

A Convolutional Neural Network based Voice Activity Detector for Smartphones

Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 71 - Forks: 22

Related Topics
speech-recognition 204 speech-to-text 132 speech 115 deep-learning 103 python 95 machine-learning 86 speech-synthesis 67 asr 57 pytorch 53 audio 53 audio-processing 48 signal-processing 45 speech-enhancement 41 nlp 38 speech-analysis 35 text-to-speech 33 natural-language-processing 32 tts 28 deep-neural-networks 25 voice-recognition 24 python3 22 speaker-recognition 22 voice-activity-detection 22 speaker-verification 21 speech-emotion-recognition 20 matlab 20 emotion-recognition 20 voice 19 artificial-intelligence 19 mfcc 18 tensorflow 18 speaker-diarization 18 convolutional-neural-networks 15 automatic-speech-recognition 15 dataset 14 speaker-identification 14 feature-extraction 14 dsp 14 voice-conversion 13 speech-separation 13 librosa 13 ai 13 cnn 12 neural-network 12 digital-signal-processing 12 neural-networks 12 self-supervised-learning 11 stt 11 audio-analysis 11 computer-vision 11 voice-assistant 11 voice-commands 11 emotion-detection 10 keras 10 vad 10 large-language-models 10 noise-reduction 10 voice-control 10 real-time 10 speech-api 9 nlp-machine-learning 9 kaldi 8 javascript 8 spoken-language-processing 8 ios 8 asr-model 8 android 7 wav 7 machine-translation 7 classification 7 diarization 7 chatbot 7 awesome-list 7 denoising 7 awesome 6 wavenet 6 natural-language-understanding 6 translation 6 whisper 6 flask 6 forced-alignment 6 unsupervised-learning 6 praat 6 tensorflow2 6 sentiment-analysis 6 mfcc-features 6 language-learning 6 linguistics 6 speech-dataset 6 music 6 c 6 multimodal-learning 6 openai 6 swift 6 representation-learning 6 corpus 5 speech-recognizer 5 bot 5 speech-translation 5 html5 5