GitHub topics: speech-processing

Repositories

benkhelifamohamedtaher/speech-emotion-recognition

Deep learning system for emotion recognition from speech, achieving 50.5% accuracy on 8-class classification using transformer architecture and real-time analysis

Language: Python - Size: 1.56 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 1 - Forks: 0

EveryVoiceTTS/EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language

Language: Python - Size: 9.98 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 34 - Forks: 2

ryota-komatsu/slp2025

音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

mende237/Nda-Nda-Force-Aligner

Forced alignment of Nda‘ Nda’ a Cameroonian language

Language: Shell - Size: 603 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

TEN-framework/ten-vad

TEN VAD: low-latency high-performance Voice Activity Detector

Language: C - Size: 9.58 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 435 - Forks: 38

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Language: Python - Size: 8.31 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 724 - Forks: 36

microsoft/torchscale

Foundation Architecture for (M)LLMs

Language: Python - Size: 361 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 3,079 - Forks: 219

ryota-komatsu/speech_resynth

Speech Resynthesis and Language Modeling Using Flow Matching and Llama

Language: Python - Size: 4.81 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 17 - Forks: 4

fulldecent/formant-analyzer

iOS application for finding formants in spoken sounds

Language: Swift - Size: 8.79 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 59 - Forks: 15

speechbrain/speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Language: HTML - Size: 46.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 366 - Forks: 30

bunyaminergen/awesome-speech-dataset

Awesome Speech Dataset, including download links and a brief explanation for each resource. These datasets provide diverse and high-quality speech data covering various domains such as conversational, academic, political, and more.

Size: 249 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 10 - Forks: 0

ryota-komatsu/speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

Language: Python - Size: 464 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 38 - Forks: 8

aaivu/KuralNet

A deep learning-based Speech Emotion Recognition (SER) model trained primarily on Indian languages. Designed for applications in call centers, sentiment analysis, and accessibility tools.

Language: Python - Size: 69.8 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0

X-LANCE/SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language: Python - Size: 169 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 811 - Forks: 79

SuperKogito/spafe

:sound: spafe: Simplified Python Audio Features Extraction

Language: Python - Size: 20.7 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 474 - Forks: 79

microsoft/UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Language: Python - Size: 72.4 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 463 - Forks: 74

midas-research/audino

Open source audio annotation tool for humans

Language: JavaScript - Size: 12.5 MB - Last synced at: 10 days ago - Pushed at: 4 months ago - Stars: 1,094 - Forks: 134

drethage/speech-denoising-wavenet

A neural network for end-to-end speech denoising

Language: Python - Size: 57.3 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 694 - Forks: 163

r9y9/wavenet_vocoder

WaveNet vocoder

Language: Python - Size: 19.7 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 2,356 - Forks: 498

aliyzd95/project-dnn-ser-pipeline

This repository contains a complete machine learning pipeline for Speech Emotion Recognition (SER) using Deep Neural Networks (DNNs).

Language: Python - Size: 6.84 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

01Zhangbw/Speech-and-audio-papers-Top-Conference

It includes papers on speech&audio field. Now update: ICLR2025-2023, ICML2025-2023, NeurIPS2024-2023, ACMMM2024, AAAI2025-2024, ACL2025-2024, EMNLP2024, NAACL2025, IJCAI2024, ECCV2024

Size: 290 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 61 - Forks: 1

pratyusha972/AccentAI

Accent prediction from videos

Language: HTML - Size: 11.7 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

raj-sutariya/indic-num2words

Python library for converting numbers to words for all Indian Languages.

Language: Python - Size: 117 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 35 - Forks: 13

gtreshchev/RuntimeSpeechRecognizer 📦

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Language: C++ - Size: 24.8 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 295 - Forks: 46

ahkarami/Great-Deep-Learning-Books

A Great Collection of Deep Learning (e)Books

Size: 600 KB - Last synced at: 14 days ago - Pushed at: 7 months ago - Stars: 135 - Forks: 30

r9y9/deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Language: Python - Size: 6.78 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 1,980 - Forks: 489

MontrealCorpusTools/PolyglotDB

Language data store and linguistic query API

Language: Python - Size: 15.1 MB - Last synced at: 13 days ago - Pushed at: 16 days ago - Stars: 40 - Forks: 15

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Language: Python - Size: 18.2 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 1,078 - Forks: 81

aliyzd95/Emotion-Recognition-In-Persian-Speech-Using-Deep-Neural-Networks

This project aims to perform Emotion Recognition in Speech using Deep Neural Networks (DNNs)

Language: Python - Size: 29.3 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

clement-pages/gryannote

Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.

Language: Svelte - Size: 2.66 MB - Last synced at: 13 days ago - Pushed at: 18 days ago - Stars: 62 - Forks: 7

MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 26 - Forks: 1

MahtaFetrat/GPTInformal-Persian-Speech-Dataset

A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject

Size: 4.88 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 7 - Forks: 0

haoheliu/voicefixer

General Speech Restoration

Language: Python - Size: 3.76 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 1,149 - Forks: 139

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language: Python - Size: 100 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 5,837 - Forks: 557

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

Language: Python - Size: 98 MB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 9,838 - Forks: 1,492

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language: Jupyter Notebook - Size: 252 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 7,529 - Forks: 878

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language: Python - Size: 4.49 MB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 2,410 - Forks: 185

resemble-ai/resemble-enhance

AI powered speech denoising and enhancement

Language: Python - Size: 23.4 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 1,782 - Forks: 205

NVIDIA/CleanUNet

Official PyTorch Implementation of CleanUNet (ICASSP 2022)

Language: Python - Size: 35.2 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 56

sp-nitech/SPTK

A suite of speech signal processing tools

Language: C++ - Size: 5.65 MB - Last synced at: 15 days ago - Pushed at: 19 days ago - Stars: 233 - Forks: 27

huawei-noah/Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Language: Jupyter Notebook - Size: 33.8 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 583 - Forks: 125

fgnt/pb_bss

Collection of EM algorithms for blind source separation of audio signals

Language: Python - Size: 635 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 286 - Forks: 61

DigitalPhonetics/IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Language: Python - Size: 21.3 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1,590 - Forks: 182

mexca/mexca

Multimodal Emotion eXpression Capture Amsterdam. Pipeline for capturing emotion expressions from multiple modalities (video, audio, text) in the wild.

Language: Python - Size: 24.8 MB - Last synced at: 17 days ago - Pushed at: 2 months ago - Stars: 34 - Forks: 6

haoheliu/voicefixer_main

General Speech Restoration

Language: Python - Size: 21.5 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 278 - Forks: 56

lukaszliniewicz/breath-removal

Detect and remove or lower the volume of breathing in speech recordings.

Language: Python - Size: 21.1 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 9 - Forks: 3

SuyashMore/MevonAI-Speech-Emotion-Recognition

Identify the emotion of multiple speakers in an Audio Segment

Language: C - Size: 63.6 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 171 - Forks: 47

mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

Language: Python - Size: 78.9 MB - Last synced at: 22 days ago - Pushed at: about 4 years ago - Stars: 1,178 - Forks: 265

daanzu/py-silero-vad-lite

Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies

Language: Python - Size: 1.9 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 14 - Forks: 1

freds0/free-svc

[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion

Language: Python - Size: 2.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 62 - Forks: 7

pliang279/MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

Language: HTML - Size: 49.9 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 541 - Forks: 80

zycv/awesome-keyword-spotting

This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).

Size: 129 KB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 257 - Forks: 40

pliang279/awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

Size: 459 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 6,421 - Forks: 879

Erangamadhushan/EM956-Community-Assistant

EM956 Community Assistant for EM956 Community Support Web portrail

Language: JavaScript - Size: 5.86 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

wq2012/awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Size: 81.1 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,735 - Forks: 232

dqqcasia/awesome-speech-translation Fork of ucaslyc/speech_translation-papers

Size: 296 KB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 178 - Forks: 1

nanahou/Awesome-Speech-Enhancement

A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.

Language: MATLAB - Size: 25.2 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 762 - Forks: 151

abikaki/awesome-speech-emotion-recognition

😎 Awesome lists about Speech Emotion Recognition

Size: 6.03 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 84 - Forks: 4

SalimLouDev/Noise-filtering-of-a-speech-signal

This project is designed for researchers, engineers, and students working in speech processing, machine learning, and signal analysis. By leveraging digital signal processing (DSP) techniques, it provides a hands-on approach to reducing unwanted noise and enhancing speech quality.

Language: Jupyter Notebook - Size: 877 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

gnanesh-16/dhvagna-npi

Advanced voice transcription tool with multi-language support outperformed current llm models.

Language: Python - Size: 101 KB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

zhitko/inton-core

Inton Core Library is designed to measure a complex of characteristics of the oral speech.

Language: C++ - Size: 30.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 2

cyrta/awesome-speech-enhancement

A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.

Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 67 - Forks: 15

vocalpy/vak

A neural network framework for researchers studying acoustic communication

Language: Python - Size: 196 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 83 - Forks: 17

spokestack/react-native-spokestack 📦

Spokestack: give your React Native app a voice interface!

Language: TypeScript - Size: 6.52 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 61 - Forks: 13

kahne/fastwer

A PyPI package for fast word/character error rate (WER/CER) calculation

Language: Python - Size: 432 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 16

amirhosseinghanipour/dasp-rs

DASP-RS is a crate for digital signal processing, speech processing, music analysis, and phonetics.

Language: Rust - Size: 1.06 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 1 - Forks: 0

seanwood/gcc-nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

Language: Python - Size: 43.2 MB - Last synced at: 19 days ago - Pushed at: about 6 years ago - Stars: 319 - Forks: 134

alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Language: Python - Size: 1.46 MB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 71 - Forks: 4

jcvasquezc/phonet

Keras-based python framework to compute phonological posterior probabilities from audio files

Language: Python - Size: 23 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 18

thibault-roux/metric-evaluator

Metric evaluator for Automatic Speech Recognition using the HATS dataset

Language: Python - Size: 121 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

arniery/andys-project

final assignment for the trinity SLP course "speech processing 2: acoustic modelling": cascade and parallel formant synthesis, the end goal being to produce vowels using both methods.

Language: Jupyter Notebook - Size: 664 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

shangeth/wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Language: Python - Size: 5.21 MB - Last synced at: 28 days ago - Pushed at: about 4 years ago - Stars: 90 - Forks: 14

ZygoteCode/VadSharp

Enterprise VAD (Voice Activity Detection) in C#.NET (.NET 6.0+) with Microsoft.ML.Net, ONNXRuntime and DirectML. The easiest, efficient, and performant Silero VAD implementation! Always open for PRs.

Language: C# - Size: 354 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 12 - Forks: 1

sidneytma/word-boundary-neural

Locating the start and end-boundaries of one-syllable words (for experimental purposes) using a convolutional neural network

Language: Jupyter Notebook - Size: 1.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

mikeroyal/NLP-Guide

Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

Language: Python - Size: 315 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 86 - Forks: 15

KennethanCeyer/awesome-audio-speech

Awesome list of Audio, Speech, and DSP(Digital signal processing)

Size: 847 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

Shivashiva07/Proxy_attendance_alert

A smart attendance system that detects proxy attendance using voice recognition and logs results with real-time dashboard monitoring.

Language: Python - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

unruli/Real-Time-Feedback-System-for-Student-Presentations

Provide automated, real-time or post-hoc feedback on student oral presentations by analyzing speech clarity, filler word usage, Emotions, pacing, and tone.

Language: Python - Size: 3.69 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

r9y9/nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

Language: Python - Size: 79.7 MB - Last synced at: 16 days ago - Pushed at: 11 months ago - Stars: 397 - Forks: 73

onolab-tmu/libss

A Python library for blind source separation.

Language: Python - Size: 14.7 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

versevo-ai/versevo-ai

Language: Python - Size: 1.83 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 4

Sambit003/versevo-ai Fork of versevo-ai/versevo-ai

Language: Python - Size: 579 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PrathuashaKB/ASR-Using-Deep-Learning

Automatic Speech Recognition is a technique that processes human speech into readable text, also known as speech-to-text or transcription systems. Mini-Project I at SSIT: Project cycle closed.

Language: Python - Size: 7.22 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 2

Voice-Lab/VoiceLab

Automated Reproducible Acoustical Analysis

Language: Python - Size: 16.5 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 152 - Forks: 19

teambits009/Universal-Translator-Culture-Guide-App

A smart travel and communication companion that enables seamless connection across languages and cultures. This AI-powered tool instantly translates text, speech, and signs while offering real-time cultural context to help users navigate new environments with confidence.

Size: 3.91 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

spokestack/spokestack-ios 📦

Spokestack: give your iOS app a voice interface!

Language: Swift - Size: 9.94 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 43 - Forks: 8

Nourine-Nadir/Speech_Processing

This repository explores speech processing techniques like noise cancellation and speech segmentation through Python code.(Speech recognition soon)

Language: Jupyter Notebook - Size: 8.39 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

slegroux/nimrod

minimal deep learning framework

Language: Jupyter Notebook - Size: 119 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 2 - Forks: 0

t0gae/AI-Dementia-Diagnosis

AI-Driven Multimodal Dementia Diagnosis: 3D MRI morphometry, and sensor data using cross-modal attention (LSTM + 3D-ResNet + Transformer). Aims to reduce late-stage diagnosis by 60% through early detection.

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

humanlab/WhiSPA

WhiSPA: Whisper Semantically-Psychologically Aligned with Self-Supervised Contrastive Learning

Language: Python - Size: 4.06 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

amnydv17/VoiceOverAI

VoiceOver AI is a speech-to-text and text-to-speech pipeline designed to process video files, extract audio, transcribe speech, and translate the text into different languages. The project leverages OpenAI's Whisper model for automatic speech recognition (ASR) and various NLP libraries for transliteration and translation.

Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

NeonGeckoCom/nsnet2-denoiser

NSNet2 Deep Noise Suppression (DNS) package

Language: Python - Size: 30.8 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 35 - Forks: 8

navalnica/be_nlp_speech_resources

Links to Belarusian NLP and Speech resources

Size: 39.1 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 42 - Forks: 0

EmergenceAI/kotlin_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies for Android and iOS.

Language: Kotlin - Size: 8.09 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 2

AzureMentor/Azure-AI-102-Study-Guide

Study Guide for the AI-102: Designing and Implementing a Microsoft Azure AI Solution Exam

Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 9 - Forks: 5

KartikJain14/darpg2024

Convert hindi audio to english and hindi text using vox

Language: Python - Size: 187 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 3

vectominist/spin

Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"

Language: Python - Size: 634 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 51 - Forks: 6

ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

Size: 5.44 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 550 - Forks: 68

actondev/wavelet-denoiser 📦

A wavelet audio denoiser done in python

Language: Python - Size: 409 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 49 - Forks: 10

huckiyang/QuantumSpeech-QCNN

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Language: Jupyter Notebook - Size: 859 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 96 - Forks: 19

Related Keywords

speech-processing 602 speech-recognition 197 speech-to-text 126 speech 113 deep-learning 102 python 93 machine-learning 84 speech-synthesis 67 asr 55 pytorch 52 audio 50 audio-processing 45 signal-processing 41 speech-enhancement 40 nlp 38 speech-analysis 35 natural-language-processing 32 text-to-speech 32 tts 28 deep-neural-networks 25 voice-recognition 23 voice-activity-detection 22 speaker-recognition 22 python3 21 speaker-verification 21 artificial-intelligence 19 matlab 19 emotion-recognition 19 voice 19 speech-emotion-recognition 19 speaker-diarization 18 tensorflow 18 mfcc 17 automatic-speech-recognition 15 dataset 14 feature-extraction 14 speaker-identification 14 convolutional-neural-networks 14 voice-conversion 13 speech-separation 13 cnn 12 digital-signal-processing 12 neural-networks 12 neural-network 12 librosa 12 ai 12 voice-commands 11 computer-vision 11 self-supervised-learning 11 voice-assistant 11 dsp 11 audio-analysis 10 stt 10 keras 10 emotion-detection 10 voice-control 10 vad 10 speech-api 9 noise-reduction 9 nlp-machine-learning 9 kaldi 8 asr-model 8 javascript 8 spoken-language-processing 8 large-language-models 8 real-time 8 classification 7 diarization 7 android 7 awesome-list 7 machine-translation 7 denoising 7 chatbot 7 wav 7 representation-learning 6 multimodal-learning 6 music 6 sentiment-analysis 6 awesome 6 praat 6 linguistics 6 speech-dataset 6 forced-alignment 6 openai 6 natural-language-understanding 6 unsupervised-learning 6 mfcc-features 6 language-learning 6 flask 6 wavenet 6 tensorflow2 6 translation 6 c 5 html5 5 rnn 5 corpus 5 music-information-retrieval 5 data-science 5 transformers 5 language-identification 5