speech-processing | Topic | Ecosyste.ms: Repos

Topic: "speech-processing"

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

Language: Python - Size: 99.4 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 10,746 - Forks: 1,597

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language: Jupyter Notebook - Size: 252 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 8,653 - Forks: 966

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language: Python - Size: 106 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 7,261 - Forks: 660

pliang279/awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

Size: 459 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 6,705 - Forks: 896

microsoft/torchscale

Foundation Architecture for (M)LLMs

Language: Python - Size: 361 KB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 3,119 - Forks: 220

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language: Python - Size: 4.48 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2,581 - Forks: 198

r9y9/wavenet_vocoder

WaveNet vocoder

Language: Python - Size: 19.7 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 2,356 - Forks: 498

r9y9/deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Language: Python - Size: 6.78 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1,981 - Forks: 487

resemble-ai/resemble-enhance

AI powered speech denoising and enhancement

Language: Python - Size: 23.4 KB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 1,941 - Forks: 230

wq2012/awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Size: 82 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 1,816 - Forks: 237

DigitalPhonetics/IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Language: Python - Size: 21.4 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1,617 - Forks: 184

TEN-framework/ten-vad

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

Language: C - Size: 9.66 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,567 - Forks: 130

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Size: 139 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1,356 - Forks: 148

haoheliu/voicefixer

General Speech Restoration

Language: Python - Size: 3.76 MB - Last synced at: 23 days ago - Pushed at: 9 months ago - Stars: 1,223 - Forks: 148

mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

Language: Python - Size: 78.9 MB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 1,178 - Forks: 265

midas-research/audino

Open source audio annotation tool for humans

Language: JavaScript - Size: 12.5 MB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 1,094 - Forks: 134

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Language: Python - Size: 18.2 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 1,078 - Forks: 81

X-LANCE/SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Language: Python - Size: 169 MB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 904 - Forks: 94

nanahou/Awesome-Speech-Enhancement

A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.

Language: MATLAB - Size: 25.2 MB - Last synced at: 5 days ago - Pushed at: almost 5 years ago - Stars: 805 - Forks: 153

Ryuk17/SpeechAlgorithms

You can find the speech algorithms you want here

Language: C - Size: 63.9 MB - Last synced at: 8 months ago - Pushed at: 11 months ago - Stars: 793 - Forks: 248

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Language: Python - Size: 8.31 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 724 - Forks: 36

drethage/speech-denoising-wavenet

A neural network for end-to-end speech denoising

Language: Python - Size: 57.3 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 699 - Forks: 164

huawei-noah/Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Language: Jupyter Notebook - Size: 33.8 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 583 - Forks: 125

Audio-WestlakeU/FullSubNet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Language: Python - Size: 892 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 569 - Forks: 157

ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

Size: 5.44 MB - Last synced at: 8 months ago - Pushed at: about 1 year ago - Stars: 550 - Forks: 68

pliang279/MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

Language: HTML - Size: 49.9 MB - Last synced at: 6 months ago - Pushed at: almost 2 years ago - Stars: 541 - Forks: 80

breizhn/DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

Language: Python - Size: 25.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 501 - Forks: 143

SuperKogito/spafe

:sound: spafe: Simplified Python Audio Features Extraction

Language: Python - Size: 20.7 MB - Last synced at: 25 days ago - Pushed at: 8 months ago - Stars: 474 - Forks: 79

arjo129/uSpeech 📦

Speech recognition toolkit for the arduino

Language: C++ - Size: 482 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 474 - Forks: 101

microsoft/UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Language: Python - Size: 72.4 MB - Last synced at: 30 days ago - Pushed at: over 1 year ago - Stars: 472 - Forks: 74

gemengtju/Tutorial_Separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Language: MATLAB - Size: 74.6 MB - Last synced at: 8 months ago - Pushed at: almost 5 years ago - Stars: 459 - Forks: 95

r9y9/pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Language: Python - Size: 15.3 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 446 - Forks: 78

santi-pdp/pase

Problem Agnostic Speech Encoder

Language: Python - Size: 10.2 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 439 - Forks: 87

novoic/surfboard 📦

Novoic's audio feature extraction library

Language: Python - Size: 598 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 437 - Forks: 47

SforAiDl/Neural-Voice-Cloning-With-Few-Samples 📦

This repository has implementation for "Neural Voice Cloning With Few Samples"

Language: Python - Size: 42.3 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 415 - Forks: 121

r9y9/nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

Language: Python - Size: 79.7 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 399 - Forks: 71

Yuan-ManX/audio-development-tools

Audio Development Tools (ADT) is a project for advancing sound, speech, and music technologies, featuring components for machine learning, sound synthesis, speech and music generation, signal processing, game audio, digital audio workstations (DAWs), and more.

Size: 904 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 376 - Forks: 26

speechbrain/speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Language: HTML - Size: 46.8 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 368 - Forks: 30

NVIDIA/CleanUNet

Official PyTorch Implementation of CleanUNet (ICASSP 2022)

Language: Python - Size: 35.2 KB - Last synced at: 25 days ago - Pushed at: about 2 years ago - Stars: 334 - Forks: 56

haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement

A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch

Language: Python - Size: 43.9 KB - Last synced at: 4 months ago - Pushed at: about 5 years ago - Stars: 328 - Forks: 62

seanwood/gcc-nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

Language: Python - Size: 43.2 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 324 - Forks: 134

rishikksh20/VocGAN

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Language: Python - Size: 187 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 318 - Forks: 60

kahne/NonAutoregGenProgress

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 303 - Forks: 31

gtreshchev/RuntimeSpeechRecognizer 📦

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Language: C++ - Size: 24.8 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 300 - Forks: 47

fgnt/pb_bss

Collection of EM algorithms for blind source separation of audio signals

Language: Python - Size: 638 KB - Last synced at: 16 days ago - Pushed at: 6 months ago - Stars: 296 - Forks: 62

haoheliu/voicefixer_main

General Speech Restoration

Language: Python - Size: 21.5 MB - Last synced at: 6 months ago - Pushed at: almost 2 years ago - Stars: 278 - Forks: 56

haoxiangsnr/Wave-U-Net-for-Speech-Enhancement

Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.

Language: Python - Size: 511 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 278 - Forks: 64

zycv/awesome-keyword-spotting

This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).

Size: 129 KB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 273 - Forks: 42

r9y9/ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Language: Jupyter Notebook - Size: 50.5 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 261 - Forks: 39

AkojimaSLP/Beamforming-for-speech-enhancement

simple delaysum, MVDR and CGMM-MVDR

Language: Python - Size: 3.18 MB - Last synced at: 8 months ago - Pushed at: almost 7 years ago - Stars: 257 - Forks: 74

Sharad24/Neural-Voice-Cloning-with-Few-Samples 📦

Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu

Language: Python - Size: 57.7 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 252 - Forks: 55

swasun/VQ-VAE-Speech 📦

PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]

Language: Python - Size: 82.4 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 247 - Forks: 52

sp-nitech/SPTK

A suite of speech signal processing tools

Language: C++ - Size: 5.97 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 241 - Forks: 28

tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

Language: Python - Size: 169 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 226 - Forks: 16

gionanide/Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Language: Python - Size: 827 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 220 - Forks: 62

kahne/SpeechTransProgress

Tracking the progress in end-to-end speech translation

Size: 121 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 218 - Forks: 26

xmindflow/Awesome_Mamba

Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis

Size: 133 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 213 - Forks: 14

jtkim-kaist/Speech-enhancement

Deep neural network based speech enhancement toolkit

Language: MATLAB - Size: 187 MB - Last synced at: 8 months ago - Pushed at: over 6 years ago - Stars: 213 - Forks: 62

innFactory/react-native-dialogflow

A React-Native Bridge for the Google Dialogflow (API.AI) SDK

Language: JavaScript - Size: 1.16 MB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 204 - Forks: 63

rishikksh20/hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Language: Python - Size: 6 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 192 - Forks: 43

cvqluu/TDNN

Time delay neural network (TDNN) implementation in Pytorch using unfold method

Language: Python - Size: 708 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 183 - Forks: 40

dqqcasia/awesome-speech-translation Fork of ucaslyc/speech_translation-papers

Size: 296 KB - Last synced at: 9 days ago - Pushed at: about 4 years ago - Stars: 179 - Forks: 1

SuyashMore/MevonAI-Speech-Emotion-Recognition

Identify the emotion of multiple speakers in an Audio Segment

Language: C - Size: 63.6 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 177 - Forks: 47

ASR-project/Multilingual-PR

Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021) and WavLM (2022) pretrained on a corpus of English speech that we will use in various ways to perform phoneme recognition for different languages with a network trained with Connectionist Temporal Classification (CTC) algorithm.

Language: Python - Size: 3.47 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 171 - Forks: 13

sekiguchi92/SoundSourceSeparation

The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.

Language: Python - Size: 31.6 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 170 - Forks: 30

ahkarami/Great-Deep-Learning-Books

A Great Collection of Deep Learning (e)Books

Size: 600 KB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 153 - Forks: 33

Voice-Lab/VoiceLab

Automated Reproducible Acoustical Analysis

Language: Python - Size: 16.5 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 152 - Forks: 19

MycroftAI/ZZZ-RETIRED__openstt 📦

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: over 9 years ago - Stars: 140 - Forks: 11

jefflai108/pytorch-kaldi-neural-speaker-embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.

Language: Perl - Size: 9.35 MB - Last synced at: 4 months ago - Pushed at: almost 6 years ago - Stars: 136 - Forks: 34

NICEElevateAI/ElevateAIJavaSDK

Java SDK for ElevateAI

Language: Java - Size: 67.4 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 121 - Forks: 0

albertaparicio/tfg-voice-conversion

Deep Learning-based Voice Conversion system

Language: Python - Size: 3.25 GB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 120 - Forks: 39

NICEElevateAI/ElevateAIDotNetSDK

.Net core 6 SDK for ElevateAI

Language: C# - Size: 934 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 115 - Forks: 0

rishikksh20/SoundStorm-pytorch

Google's SoundStorm: Efficient Parallel Audio Generation

Language: Python - Size: 269 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 114 - Forks: 12

NICEElevateAI/ElevateAIPythonSDK

ElevateAI - Speech-to-text API Python SDK

Language: Python - Size: 43.9 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 111 - Forks: 0

mikeroyal/NLP-Guide

Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

Language: Python - Size: 315 KB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 106 - Forks: 18

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

Language: Jupyter Notebook - Size: 307 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 105 - Forks: 18

huckiyang/QuantumSpeech-QCNN

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Language: Jupyter Notebook - Size: 859 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 101 - Forks: 20

ga642381/SpeechPrompt

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Language: Python - Size: 49.3 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 99 - Forks: 8

abikaki/awesome-speech-emotion-recognition

😎 Awesome lists about Speech Emotion Recognition

Size: 6.03 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 99 - Forks: 6

kehanlu/DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

Language: HTML - Size: 4.44 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 98 - Forks: 7

atosystem/SpeechCLIP

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022

Language: Python - Size: 999 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 96 - Forks: 5

alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Language: Python - Size: 1.48 MB - Last synced at: 14 days ago - Pushed at: 4 months ago - Stars: 94 - Forks: 7

Appen/UHV-OTS-Speech 📦

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Language: Forth - Size: 1.41 GB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 92 - Forks: 15