An open API service providing repository metadata for many open source software ecosystems.

Topic: "speech-processing"

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

Language: Python - Size: 98.2 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 9,952 - Forks: 1,504

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language: Jupyter Notebook - Size: 252 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 7,671 - Forks: 889

pliang279/awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

Size: 459 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 6,516 - Forks: 885

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language: Python - Size: 100 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 6,032 - Forks: 574

microsoft/torchscale

Foundation Architecture for (M)LLMs

Language: Python - Size: 361 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 3,087 - Forks: 223

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language: Python - Size: 4.49 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 2,460 - Forks: 189

r9y9/wavenet_vocoder

WaveNet vocoder

Language: Python - Size: 19.7 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2,356 - Forks: 498

r9y9/deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Language: Python - Size: 6.78 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,980 - Forks: 489

resemble-ai/resemble-enhance

AI powered speech denoising and enhancement

Language: Python - Size: 23.4 KB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 1,850 - Forks: 217

wq2012/awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Size: 81.1 KB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 1,761 - Forks: 231

DigitalPhonetics/IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Language: Python - Size: 21.4 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1,617 - Forks: 184

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Size: 139 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1,318 - Forks: 142

haoheliu/voicefixer

General Speech Restoration

Language: Python - Size: 3.76 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 1,178 - Forks: 142

mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

Language: Python - Size: 78.9 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 1,178 - Forks: 265

midas-research/audino

Open source audio annotation tool for humans

Language: JavaScript - Size: 12.5 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1,094 - Forks: 134

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Language: Python - Size: 18.2 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 1,078 - Forks: 81

X-LANCE/SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language: Python - Size: 169 MB - Last synced at: about 2 hours ago - Pushed at: 20 days ago - Stars: 844 - Forks: 85

Ryuk17/SpeechAlgorithms

You can find the speech algorithms you want here

Language: C - Size: 63.9 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 793 - Forks: 248

nanahou/Awesome-Speech-Enhancement

A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.

Language: MATLAB - Size: 25.2 MB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 775 - Forks: 153

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Language: Python - Size: 8.31 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 724 - Forks: 36

drethage/speech-denoising-wavenet

A neural network for end-to-end speech denoising

Language: Python - Size: 57.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 694 - Forks: 163

TEN-framework/ten-vad

Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight

Language: C - Size: 9.79 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 627 - Forks: 59

huawei-noah/Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Language: Jupyter Notebook - Size: 33.8 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 583 - Forks: 125

Audio-WestlakeU/FullSubNet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Language: Python - Size: 892 KB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 552 - Forks: 156

ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等,欢迎推荐或自荐

Size: 5.44 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 550 - Forks: 68

pliang279/MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

Language: HTML - Size: 49.9 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 541 - Forks: 80

breizhn/DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

Language: Python - Size: 25.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 501 - Forks: 143

SuperKogito/spafe

:sound: spafe: Simplified Python Audio Features Extraction

Language: Python - Size: 20.7 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 475 - Forks: 79

arjo129/uSpeech 📦

Speech recognition toolkit for the arduino

Language: C++ - Size: 482 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 474 - Forks: 102

microsoft/UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Language: Python - Size: 72.4 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 464 - Forks: 74

gemengtju/Tutorial_Separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Language: MATLAB - Size: 74.6 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 459 - Forks: 95

r9y9/pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Language: Python - Size: 15.3 MB - Last synced at: 11 days ago - Pushed at: 12 months ago - Stars: 442 - Forks: 78

santi-pdp/pase

Problem Agnostic Speech Encoder

Language: Python - Size: 10.2 MB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 439 - Forks: 87

novoic/surfboard 📦

Novoic's audio feature extraction library

Language: Python - Size: 598 KB - Last synced at: 30 days ago - Pushed at: over 3 years ago - Stars: 436 - Forks: 47

SforAiDl/Neural-Voice-Cloning-With-Few-Samples 📦

This repository has implementation for "Neural Voice Cloning With Few Samples"

Language: Python - Size: 42.3 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 415 - Forks: 121

r9y9/nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

Language: Python - Size: 79.7 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 397 - Forks: 73

speechbrain/speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Language: HTML - Size: 46.8 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 368 - Forks: 30

Yuan-ManX/audio-development-tools

This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.

Size: 2.18 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 346 - Forks: 24

NVIDIA/CleanUNet

Official PyTorch Implementation of CleanUNet (ICASSP 2022)

Language: Python - Size: 35.2 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 56

seanwood/gcc-nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

Language: Python - Size: 43.2 MB - Last synced at: about 2 months ago - Pushed at: about 6 years ago - Stars: 319 - Forks: 134

rishikksh20/VocGAN

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Language: Python - Size: 187 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 318 - Forks: 60

haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement

A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch

Language: Python - Size: 43.9 KB - Last synced at: 8 months ago - Pushed at: almost 5 years ago - Stars: 315 - Forks: 58

kahne/NonAutoregGenProgress

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 303 - Forks: 31

gtreshchev/RuntimeSpeechRecognizer 📦

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Language: C++ - Size: 24.8 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 297 - Forks: 47

fgnt/pb_bss

Collection of EM algorithms for blind source separation of audio signals

Language: Python - Size: 635 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 286 - Forks: 61

haoheliu/voicefixer_main

General Speech Restoration

Language: Python - Size: 21.5 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 278 - Forks: 56

haoxiangsnr/Wave-U-Net-for-Speech-Enhancement

Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.

Language: Python - Size: 511 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 278 - Forks: 64

zycv/awesome-keyword-spotting

This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).

Size: 129 KB - Last synced at: 12 days ago - Pushed at: about 3 years ago - Stars: 260 - Forks: 40

r9y9/ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Language: Jupyter Notebook - Size: 50.5 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 259 - Forks: 39

AkojimaSLP/Beamforming-for-speech-enhancement

simple delaysum, MVDR and CGMM-MVDR

Language: Python - Size: 3.18 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 257 - Forks: 74

Sharad24/Neural-Voice-Cloning-with-Few-Samples 📦

Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu

Language: Python - Size: 57.7 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 252 - Forks: 55

swasun/VQ-VAE-Speech 📦

PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]

Language: Python - Size: 82.4 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 247 - Forks: 52

sp-nitech/SPTK

A suite of speech signal processing tools

Language: C++ - Size: 5.68 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 234 - Forks: 27

tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

Language: Python - Size: 169 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 220 - Forks: 15

gionanide/Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Language: Python - Size: 827 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 220 - Forks: 62

kahne/SpeechTransProgress

Tracking the progress in end-to-end speech translation

Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 218 - Forks: 26

xmindflow/Awesome_Mamba

Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis

Size: 133 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 213 - Forks: 14

jtkim-kaist/Speech-enhancement

Deep neural network based speech enhancement toolkit

Language: MATLAB - Size: 187 MB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 213 - Forks: 62

innFactory/react-native-dialogflow

A React-Native Bridge for the Google Dialogflow (API.AI) SDK

Language: JavaScript - Size: 1.16 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 204 - Forks: 64

rishikksh20/hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Language: Python - Size: 6 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 192 - Forks: 43

cvqluu/TDNN

Time delay neural network (TDNN) implementation in Pytorch using unfold method

Language: Python - Size: 708 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 183 - Forks: 40

dqqcasia/awesome-speech-translation Fork of ucaslyc/speech_translation-papers

Size: 296 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 178 - Forks: 1

SuyashMore/MevonAI-Speech-Emotion-Recognition

Identify the emotion of multiple speakers in an Audio Segment

Language: C - Size: 63.6 MB - Last synced at: 30 days ago - Pushed at: over 2 years ago - Stars: 171 - Forks: 47

ASR-project/Multilingual-PR

Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021) and WavLM (2022) pretrained on a corpus of English speech that we will use in various ways to perform phoneme recognition for different languages with a network trained with Connectionist Temporal Classification (CTC) algorithm.

Language: Python - Size: 3.47 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 171 - Forks: 13

sekiguchi92/SoundSourceSeparation

The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.

Language: Python - Size: 31.6 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 170 - Forks: 30

Voice-Lab/VoiceLab

Automated Reproducible Acoustical Analysis

Language: Python - Size: 16.5 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 152 - Forks: 19

MycroftAI/ZZZ-RETIRED__openstt 📦

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

Size: 26.4 KB - Last synced at: 7 days ago - Pushed at: over 9 years ago - Stars: 142 - Forks: 11

jefflai108/pytorch-kaldi-neural-speaker-embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.

Language: Perl - Size: 9.35 MB - Last synced at: 7 months ago - Pushed at: over 5 years ago - Stars: 137 - Forks: 34

ahkarami/Great-Deep-Learning-Books

A Great Collection of Deep Learning (e)Books

Size: 600 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 135 - Forks: 30

NICEElevateAI/ElevateAIJavaSDK

Java SDK for ElevateAI

Language: Java - Size: 67.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 121 - Forks: 0

albertaparicio/tfg-voice-conversion

Deep Learning-based Voice Conversion system

Language: Python - Size: 3.25 GB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 120 - Forks: 39

NICEElevateAI/ElevateAIDotNetSDK

.Net core 6 SDK for ElevateAI

Language: C# - Size: 934 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 115 - Forks: 0

rishikksh20/SoundStorm-pytorch

Google's SoundStorm: Efficient Parallel Audio Generation

Language: Python - Size: 269 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 114 - Forks: 12

NICEElevateAI/ElevateAIPythonSDK

ElevateAI - Speech-to-text API Python SDK

Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 111 - Forks: 0

huckiyang/QuantumSpeech-QCNN

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Language: Jupyter Notebook - Size: 859 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 101 - Forks: 20

ga642381/SpeechPrompt

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Language: Python - Size: 49.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 99 - Forks: 8

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

Language: Jupyter Notebook - Size: 254 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 97 - Forks: 16

atosystem/SpeechCLIP

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022

Language: Python - Size: 999 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 5

mikeroyal/NLP-Guide

Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

Language: Python - Size: 315 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 93 - Forks: 15

Appen/UHV-OTS-Speech 📦

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Language: Forth - Size: 1.41 GB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 92 - Forks: 15

haoheliu/torchsubband

Pytorch implementation of subband decomposition

Language: HTML - Size: 374 KB - Last synced at: 14 days ago - Pushed at: almost 3 years ago - Stars: 92 - Forks: 13

abikaki/awesome-speech-emotion-recognition

😎 Awesome lists about Speech Emotion Recognition

Size: 6.03 MB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 91 - Forks: 6

shangeth/wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Language: Python - Size: 5.21 MB - Last synced at: 22 days ago - Pushed at: about 4 years ago - Stars: 91 - Forks: 14

r9y9/SPTK

A modified version of Speech Signal Processing Toolkit (SPTK)

Language: C - Size: 4.31 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 89 - Forks: 18

vocalpy/vak

A neural network framework for researchers studying acoustic communication

Language: Python - Size: 196 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 84 - Forks: 17

Lhx94As/Awesome-Spoken-Language-Identification

An awesome spoken LID repository. (Working in progress

Language: Python - Size: 959 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 83 - Forks: 10

NickWilkinson37/voxseg

A python library for voice activity detection (VAD) for speech/non-speech segmentation.

Language: Python - Size: 98.1 MB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 83 - Forks: 12

alshell7/vokaturi-android

Emotion recognition by speech in android.

Language: C - Size: 2.01 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 82 - Forks: 18

vipchengrui/traditional-speech-enhancement

Spectral Subtraction, Wiener Filtering, MMSE

Language: MATLAB - Size: 39.8 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 81 - Forks: 34

FlorianKrey/DNC

Discriminative Neural Clustering for Speaker Diarisation

Language: Python - Size: 3.62 GB - Last synced at: 9 days ago - Pushed at: about 3 years ago - Stars: 78 - Forks: 14

stevenhillis/awesome-asr-contextualization

A curated list of awesome papers on contextualizing E2E ASR outputs

Size: 59.6 KB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 77 - Forks: 9

ga642381/SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

Size: 141 KB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 74 - Forks: 5

alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Language: Python - Size: 1.46 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 73 - Forks: 4

mwv/vad

Voice Activity Detector

Language: Python - Size: 24.4 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 73 - Forks: 13

huckiyang/Voice2Series-Reprogramming

ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification

Language: TypeScript - Size: 194 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 72 - Forks: 12

kahne/fastwer

A PyPI package for fast word/character error rate (WER/CER) calculation

Language: Python - Size: 432 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 72 - Forks: 16

grausof/keras-sincnet

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

Language: Python - Size: 260 KB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 72 - Forks: 26

nguyennpa412/vietnamese-speech-to-text-wavenet

Vietnamese speech recognition using Wavenet

Language: Python - Size: 52.9 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 71 - Forks: 36

SIP-Lab/CNN-VAD

A Convolutional Neural Network based Voice Activity Detector for Smartphones

Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 71 - Forks: 22

inevolin/DiscordEarsBot

A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.

Language: JavaScript - Size: 38.6 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 70 - Forks: 351

Related Topics
speech-recognition 200 speech-to-text 128 speech 113 deep-learning 102 python 94 machine-learning 85 speech-synthesis 67 asr 57 pytorch 53 audio 52 audio-processing 47 signal-processing 43 speech-enhancement 40 nlp 38 speech-analysis 35 text-to-speech 32 natural-language-processing 32 tts 28 deep-neural-networks 25 voice-recognition 23 python3 22 speaker-recognition 22 voice-activity-detection 22 speaker-verification 21 emotion-recognition 20 matlab 19 speech-emotion-recognition 19 artificial-intelligence 19 voice 19 speaker-diarization 18 tensorflow 18 mfcc 17 automatic-speech-recognition 15 dataset 14 speaker-identification 14 feature-extraction 14 convolutional-neural-networks 14 speech-separation 13 voice-conversion 13 dsp 13 librosa 13 digital-signal-processing 12 neural-networks 12 ai 12 neural-network 12 cnn 12 stt 11 voice-commands 11 computer-vision 11 self-supervised-learning 11 voice-assistant 11 vad 10 voice-control 10 real-time 10 emotion-detection 10 keras 10 audio-analysis 10 noise-reduction 9 speech-api 9 nlp-machine-learning 9 asr-model 8 kaldi 8 ios 8 javascript 8 large-language-models 8 spoken-language-processing 8 denoising 7 diarization 7 android 7 wav 7 machine-translation 7 classification 7 chatbot 7 awesome-list 7 mfcc-features 6 multimodal-learning 6 wavenet 6 openai 6 representation-learning 6 natural-language-understanding 6 language-learning 6 sentiment-analysis 6 awesome 6 linguistics 6 praat 6 unsupervised-learning 6 translation 6 forced-alignment 6 speech-dataset 6 whisper 6 swift 6 music 6 tensorflow2 6 flask 6 corpus 5 speech-recognizer 5 speech-denoising 5 transformers 5 reinforcement-learning 5 bot 5