GitHub topics: speech-processing
benkhelifamohamedtaher/speech-emotion-recognition
Deep learning system for emotion recognition from speech, achieving 50.5% accuracy on 8-class classification using transformer architecture and real-time analysis
Language: Python - Size: 1.56 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 1 - Forks: 0

EveryVoiceTTS/EveryVoice
The EveryVoice TTS Toolkit - Text To Speech for your language
Language: Python - Size: 9.98 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 34 - Forks: 2

ryota-komatsu/slp2025
音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料
Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

mende237/Nda-Nda-Force-Aligner
Forced alignment of Nda‘ Nda’ a Cameroonian language
Language: Shell - Size: 603 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

TEN-framework/ten-vad
TEN VAD: low-latency high-performance Voice Activity Detector
Language: C - Size: 9.58 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 435 - Forks: 38

nyrahealth/CrisperWhisper
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Language: Python - Size: 8.31 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 724 - Forks: 36

microsoft/torchscale
Foundation Architecture for (M)LLMs
Language: Python - Size: 361 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 3,079 - Forks: 219

ryota-komatsu/speech_resynth
Speech Resynthesis and Language Modeling Using Flow Matching and Llama
Language: Python - Size: 4.81 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 17 - Forks: 4

fulldecent/formant-analyzer
iOS application for finding formants in spoken sounds
Language: Swift - Size: 8.79 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 59 - Forks: 15

speechbrain/speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Language: HTML - Size: 46.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 366 - Forks: 30

bunyaminergen/awesome-speech-dataset
Awesome Speech Dataset, including download links and a brief explanation for each resource. These datasets provide diverse and high-quality speech data covering various domains such as conversational, academic, political, and more.
Size: 249 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 10 - Forks: 0

ryota-komatsu/speaker_disentangled_hubert
Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
Language: Python - Size: 464 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 38 - Forks: 8

aaivu/KuralNet
A deep learning-based Speech Emotion Recognition (SER) model trained primarily on Indian languages. Designed for applications in call centers, sentiment analysis, and accessibility tools.
Language: Python - Size: 69.8 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0

X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
Language: Python - Size: 169 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 811 - Forks: 79

SuperKogito/spafe
:sound: spafe: Simplified Python Audio Features Extraction
Language: Python - Size: 20.7 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 474 - Forks: 79

microsoft/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Language: Python - Size: 72.4 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 463 - Forks: 74

midas-research/audino
Open source audio annotation tool for humans
Language: JavaScript - Size: 12.5 MB - Last synced at: 10 days ago - Pushed at: 4 months ago - Stars: 1,094 - Forks: 134

drethage/speech-denoising-wavenet
A neural network for end-to-end speech denoising
Language: Python - Size: 57.3 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 694 - Forks: 163

r9y9/wavenet_vocoder
WaveNet vocoder
Language: Python - Size: 19.7 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 2,356 - Forks: 498

aliyzd95/project-dnn-ser-pipeline
This repository contains a complete machine learning pipeline for Speech Emotion Recognition (SER) using Deep Neural Networks (DNNs).
Language: Python - Size: 6.84 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

01Zhangbw/Speech-and-audio-papers-Top-Conference
It includes papers on speech&audio field. Now update: ICLR2025-2023, ICML2025-2023, NeurIPS2024-2023, ACMMM2024, AAAI2025-2024, ACL2025-2024, EMNLP2024, NAACL2025, IJCAI2024, ECCV2024
Size: 290 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 61 - Forks: 1

pratyusha972/AccentAI
Accent prediction from videos
Language: HTML - Size: 11.7 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

raj-sutariya/indic-num2words
Python library for converting numbers to words for all Indian Languages.
Language: Python - Size: 117 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 35 - Forks: 13

gtreshchev/RuntimeSpeechRecognizer 📦
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
Language: C++ - Size: 24.8 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 295 - Forks: 46

ahkarami/Great-Deep-Learning-Books
A Great Collection of Deep Learning (e)Books
Size: 600 KB - Last synced at: 14 days ago - Pushed at: 7 months ago - Stars: 135 - Forks: 30

r9y9/deepvoice3_pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Language: Python - Size: 6.78 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 1,980 - Forks: 489

MontrealCorpusTools/PolyglotDB
Language data store and linguistic query API
Language: Python - Size: 15.1 MB - Last synced at: 13 days ago - Pushed at: 16 days ago - Stars: 40 - Forks: 15

ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language: Python - Size: 18.2 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 1,078 - Forks: 81

aliyzd95/Emotion-Recognition-In-Persian-Speech-Using-Deep-Neural-Networks
This project aims to perform Emotion Recognition in Speech using Deep Neural Networks (DNNs)
Language: Python - Size: 29.3 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

clement-pages/gryannote
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
Language: Svelte - Size: 2.66 MB - Last synced at: 13 days ago - Pushed at: 18 days ago - Stars: 62 - Forks: 7

MahtaFetrat/ManaTTS-Persian-Speech-Dataset
ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 26 - Forks: 1

MahtaFetrat/GPTInformal-Persian-Speech-Dataset
A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject
Size: 4.88 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 7 - Forks: 0

haoheliu/voicefixer
General Speech Restoration
Language: Python - Size: 3.76 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 1,149 - Forks: 139

snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language: Python - Size: 100 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 5,837 - Forks: 557

speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language: Python - Size: 98 MB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 9,838 - Forks: 1,492

pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language: Jupyter Notebook - Size: 252 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 7,529 - Forks: 878

linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language: Python - Size: 4.49 MB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 2,410 - Forks: 185

resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
Language: Python - Size: 23.4 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 1,782 - Forks: 205

NVIDIA/CleanUNet
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
Language: Python - Size: 35.2 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 56

sp-nitech/SPTK
A suite of speech signal processing tools
Language: C++ - Size: 5.65 MB - Last synced at: 15 days ago - Pushed at: 19 days ago - Stars: 233 - Forks: 27

huawei-noah/Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Language: Jupyter Notebook - Size: 33.8 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 583 - Forks: 125

fgnt/pb_bss
Collection of EM algorithms for blind source separation of audio signals
Language: Python - Size: 635 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 286 - Forks: 61

DigitalPhonetics/IMS-Toucan
Controllable and fast Text-to-Speech for over 7000 languages!
Language: Python - Size: 21.3 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1,590 - Forks: 182

mexca/mexca
Multimodal Emotion eXpression Capture Amsterdam. Pipeline for capturing emotion expressions from multiple modalities (video, audio, text) in the wild.
Language: Python - Size: 24.8 MB - Last synced at: 17 days ago - Pushed at: 2 months ago - Stars: 34 - Forks: 6

haoheliu/voicefixer_main
General Speech Restoration
Language: Python - Size: 21.5 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 278 - Forks: 56

lukaszliniewicz/breath-removal
Detect and remove or lower the volume of breathing in speech recordings.
Language: Python - Size: 21.1 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 9 - Forks: 3

SuyashMore/MevonAI-Speech-Emotion-Recognition
Identify the emotion of multiple speakers in an Audio Segment
Language: C - Size: 63.6 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 171 - Forks: 47

mravanelli/SincNet
SincNet is a neural architecture for efficiently processing raw audio samples.
Language: Python - Size: 78.9 MB - Last synced at: 22 days ago - Pushed at: about 4 years ago - Stars: 1,178 - Forks: 265

daanzu/py-silero-vad-lite
Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies
Language: Python - Size: 1.9 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 14 - Forks: 1

freds0/free-svc
[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
Language: Python - Size: 2.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 62 - Forks: 7

pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Language: HTML - Size: 49.9 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 541 - Forks: 80

zycv/awesome-keyword-spotting
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
Size: 129 KB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 257 - Forks: 40

pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Size: 459 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 6,421 - Forks: 879

Erangamadhushan/EM956-Community-Assistant
EM956 Community Assistant for EM956 Community Support Web portrail
Language: JavaScript - Size: 5.86 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

wq2012/awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Size: 81.1 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,735 - Forks: 232

dqqcasia/awesome-speech-translation Fork of ucaslyc/speech_translation-papers
Size: 296 KB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 178 - Forks: 1

nanahou/Awesome-Speech-Enhancement
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
Language: MATLAB - Size: 25.2 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 762 - Forks: 151

abikaki/awesome-speech-emotion-recognition
😎 Awesome lists about Speech Emotion Recognition
Size: 6.03 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 84 - Forks: 4

SalimLouDev/Noise-filtering-of-a-speech-signal
This project is designed for researchers, engineers, and students working in speech processing, machine learning, and signal analysis. By leveraging digital signal processing (DSP) techniques, it provides a hands-on approach to reducing unwanted noise and enhancing speech quality.
Language: Jupyter Notebook - Size: 877 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

gnanesh-16/dhvagna-npi
Advanced voice transcription tool with multi-language support outperformed current llm models.
Language: Python - Size: 101 KB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

zhitko/inton-core
Inton Core Library is designed to measure a complex of characteristics of the oral speech.
Language: C++ - Size: 30.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 2

cyrta/awesome-speech-enhancement
A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.
Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 67 - Forks: 15

vocalpy/vak
A neural network framework for researchers studying acoustic communication
Language: Python - Size: 196 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 83 - Forks: 17

spokestack/react-native-spokestack 📦
Spokestack: give your React Native app a voice interface!
Language: TypeScript - Size: 6.52 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 61 - Forks: 13

kahne/fastwer
A PyPI package for fast word/character error rate (WER/CER) calculation
Language: Python - Size: 432 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 16

amirhosseinghanipour/dasp-rs
DASP-RS is a crate for digital signal processing, speech processing, music analysis, and phonetics.
Language: Rust - Size: 1.06 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 1 - Forks: 0

seanwood/gcc-nmf
Real-time GCC-NMF Blind Speech Separation and Enhancement
Language: Python - Size: 43.2 MB - Last synced at: 19 days ago - Pushed at: about 6 years ago - Stars: 319 - Forks: 134

alessandroragano/scoreq
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
Language: Python - Size: 1.46 MB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 71 - Forks: 4

jcvasquezc/phonet
Keras-based python framework to compute phonological posterior probabilities from audio files
Language: Python - Size: 23 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 18

thibault-roux/metric-evaluator
Metric evaluator for Automatic Speech Recognition using the HATS dataset
Language: Python - Size: 121 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

arniery/andys-project
final assignment for the trinity SLP course "speech processing 2: acoustic modelling": cascade and parallel formant synthesis, the end goal being to produce vowels using both methods.
Language: Jupyter Notebook - Size: 664 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

shangeth/wavencoder
WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.
Language: Python - Size: 5.21 MB - Last synced at: 28 days ago - Pushed at: about 4 years ago - Stars: 90 - Forks: 14

ZygoteCode/VadSharp
Enterprise VAD (Voice Activity Detection) in C#.NET (.NET 6.0+) with Microsoft.ML.Net, ONNXRuntime and DirectML. The easiest, efficient, and performant Silero VAD implementation! Always open for PRs.
Language: C# - Size: 354 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 12 - Forks: 1

sidneytma/word-boundary-neural
Locating the start and end-boundaries of one-syllable words (for experimental purposes) using a convolutional neural network
Language: Jupyter Notebook - Size: 1.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

mikeroyal/NLP-Guide
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
Language: Python - Size: 315 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 86 - Forks: 15

KennethanCeyer/awesome-audio-speech
Awesome list of Audio, Speech, and DSP(Digital signal processing)
Size: 847 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

Shivashiva07/Proxy_attendance_alert
A smart attendance system that detects proxy attendance using voice recognition and logs results with real-time dashboard monitoring.
Language: Python - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

unruli/Real-Time-Feedback-System-for-Student-Presentations
Provide automated, real-time or post-hoc feedback on student oral presentations by analyzing speech clarity, filler word usage, Emotions, pacing, and tone.
Language: Python - Size: 3.69 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

r9y9/nnmnkwii
Library to build speech synthesis systems designed for easy and fast prototyping.
Language: Python - Size: 79.7 MB - Last synced at: 16 days ago - Pushed at: 11 months ago - Stars: 397 - Forks: 73

onolab-tmu/libss
A Python library for blind source separation.
Language: Python - Size: 14.7 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

versevo-ai/versevo-ai
Language: Python - Size: 1.83 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 4

Sambit003/versevo-ai Fork of versevo-ai/versevo-ai
Language: Python - Size: 579 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PrathuashaKB/ASR-Using-Deep-Learning
Automatic Speech Recognition is a technique that processes human speech into readable text, also known as speech-to-text or transcription systems. Mini-Project I at SSIT: Project cycle closed.
Language: Python - Size: 7.22 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 2

Voice-Lab/VoiceLab
Automated Reproducible Acoustical Analysis
Language: Python - Size: 16.5 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 152 - Forks: 19

teambits009/Universal-Translator-Culture-Guide-App
A smart travel and communication companion that enables seamless connection across languages and cultures. This AI-powered tool instantly translates text, speech, and signs while offering real-time cultural context to help users navigate new environments with confidence.
Size: 3.91 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

spokestack/spokestack-ios 📦
Spokestack: give your iOS app a voice interface!
Language: Swift - Size: 9.94 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 43 - Forks: 8

Nourine-Nadir/Speech_Processing
This repository explores speech processing techniques like noise cancellation and speech segmentation through Python code.(Speech recognition soon)
Language: Jupyter Notebook - Size: 8.39 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

slegroux/nimrod
minimal deep learning framework
Language: Jupyter Notebook - Size: 119 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 2 - Forks: 0

t0gae/AI-Dementia-Diagnosis
AI-Driven Multimodal Dementia Diagnosis: 3D MRI morphometry, and sensor data using cross-modal attention (LSTM + 3D-ResNet + Transformer). Aims to reduce late-stage diagnosis by 60% through early detection.
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

humanlab/WhiSPA
WhiSPA: Whisper Semantically-Psychologically Aligned with Self-Supervised Contrastive Learning
Language: Python - Size: 4.06 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

amnydv17/VoiceOverAI
VoiceOver AI is a speech-to-text and text-to-speech pipeline designed to process video files, extract audio, transcribe speech, and translate the text into different languages. The project leverages OpenAI's Whisper model for automatic speech recognition (ASR) and various NLP libraries for transliteration and translation.
Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

NeonGeckoCom/nsnet2-denoiser
NSNet2 Deep Noise Suppression (DNS) package
Language: Python - Size: 30.8 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 35 - Forks: 8

navalnica/be_nlp_speech_resources
Links to Belarusian NLP and Speech resources
Size: 39.1 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 42 - Forks: 0

EmergenceAI/kotlin_speech_features
This library provides common speech features for ASR including MFCCs and filterbank energies for Android and iOS.
Language: Kotlin - Size: 8.09 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 2

AzureMentor/Azure-AI-102-Study-Guide
Study Guide for the AI-102: Designing and Implementing a Microsoft Azure AI Solution Exam
Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 9 - Forks: 5

KartikJain14/darpg2024
Convert hindi audio to english and hindi text using vox
Language: Python - Size: 187 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 3

vectominist/spin
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
Language: Python - Size: 634 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 51 - Forks: 6

ddlBoJack/Speech-Resources
语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
Size: 5.44 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 550 - Forks: 68

actondev/wavelet-denoiser 📦
A wavelet audio denoiser done in python
Language: Python - Size: 409 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 49 - Forks: 10

huckiyang/QuantumSpeech-QCNN
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
Language: Jupyter Notebook - Size: 859 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 19
