GitHub topics: voice-activity-detection

Repositories

smacke/ffsubsync

Automagically synchronize subtitles with video.

Language: Python - Size: 3.7 MB - Last synced at: about 5 hours ago - Pushed at: 7 days ago - Stars: 7,262 - Forks: 296

gbibbo/vad_benchmark

Privacy‑preserving VAD benchmark on domestic audio (CHiME‑Home): 8 models, accuracy vs efficiency.

Language: Python - Size: 18.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

FluidInference/FluidAudio

A Fully Native Solution with Swift and CoreML Models Offering Speaker Diarization, VAD, and Speech-to-Text.

Language: Swift - Size: 65.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 412 - Forks: 44

amsehili/auditok

An audio/acoustic activity detection and audio segmentation tool

Language: Python - Size: 3.68 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 790 - Forks: 97

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

Language: C++ - Size: 2.08 MB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 1,418 - Forks: 187

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language: Python - Size: 100 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 11,641 - Forks: 1,179

baochuquan/ios-vad

iOS Voice Activity Detection (VAD). Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

Language: Swift - Size: 4.5 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 21 - Forks: 2

gtreshchev/RuntimeAudioImporter 📦

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

Language: C++ - Size: 10.1 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 391 - Forks: 80

juanmc2005/diart

A python package to build AI-powered real-time audio applications

Language: Python - Size: 34.8 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 1,369 - Forks: 103

nianlonggu/WhisperSeg

Code for ICASSP 2024 paper WhisperSeg: Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection

Language: Python - Size: 243 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 33 - Forks: 12

Saga9103/t2yLLM

A voice assistant with local LLM as a backend

Language: Python - Size: 213 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 6 - Forks: 0

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

Language: Go - Size: 5.87 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 9,765 - Forks: 240

TEN-framework/ten-vad

Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight

Language: C - Size: 9.55 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 972 - Forks: 88

techAli1996/wakeword

ESP32S3 Wakeword/Keyword Spotting starter project with ready to go ML model

Language: C - Size: 4.68 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

Picovoice/cobra

On-device voice activity detection (VAD) powered by deep learning

Language: Python - Size: 43 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 220 - Forks: 15

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

Language: C - Size: 5.16 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 374 - Forks: 79

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

Language: HTML - Size: 86.9 MB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 74 - Forks: 9

pykeio/earshot

Ridiculously fast voice activity detection in pure #[no_std] Rust

Language: Rust - Size: 879 KB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 19 - Forks: 1

pmbstyle/Alice

Alice is a smart desktop AI assistant application built with Vue.js, Vite, and Electron. Advanced memory system, function calling, MCP support and more.

Language: TypeScript - Size: 78.7 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 92 - Forks: 11

baxtree/subaligner

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

Language: Python - Size: 103 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 479 - Forks: 19

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Language: Jupyter Notebook - Size: 1.16 MB - Last synced at: 16 days ago - Pushed at: 11 months ago - Stars: 435 - Forks: 59

Paradeluxe/Praditor

Praditor: A DBSCAN-Based Automation for Speech Onset Detection

Language: Python - Size: 195 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 3 - Forks: 0

daanzu/py-silero-vad-lite

Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies

Language: Python - Size: 1.9 MB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 15 - Forks: 1

zhenghuatan/rVADfast

This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.

Language: Python - Size: 3.63 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 140 - Forks: 24

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

Language: C++ - Size: 1.93 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 78 - Forks: 19

bigcash/awesome-vad

A curated list of awesome voice activity detection

Size: 9.77 KB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 59 - Forks: 3

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

Language: Jupyter Notebook - Size: 254 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 97 - Forks: 16

RimAmarat/RealTimeSpeechRec

Real Time Speech Recognition with Voice Activity Detection using Pytorch

Language: Python - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

spokestack/spokestack-ios 📦

Spokestack: give your iOS app a voice interface!

Language: Swift - Size: 9.94 MB - Last synced at: 20 days ago - Pushed at: almost 4 years ago - Stars: 44 - Forks: 8

ina-foss/inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

Language: Python - Size: 36.6 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 815 - Forks: 139

tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

Language: Python - Size: 169 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 226 - Forks: 16

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language: Jupyter Notebook - Size: 252 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7,671 - Forks: 889

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language: Python - Size: 100 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6,032 - Forks: 574

OpenVoiceOS/ovos-vad-plugin-silero

ovos plugin for voice activity detection using silero vad

Language: Python - Size: 1.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 2

OpenVoiceOS/ovos-vad-plugin-webrtcvad

ovos plugin for voice activity detection using webrtcvad

Language: Python - Size: 27.3 KB - Last synced at: 19 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

thurti/vad-audio-worklet

Voice Activity Detection (VAD) AudioWorklet

Language: JavaScript - Size: 762 KB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 5

ZygoteCode/VadSharp

Enterprise VAD (Voice Activity Detection) in C#.NET (.NET 6.0+) with Microsoft.ML.Net, ONNXRuntime and DirectML. The easiest, efficient, and performant Silero VAD implementation! Always open for PRs.

Language: C# - Size: 354 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

stefanwebb/open-voice-activity-detection

Fully open-source and state-of-the-art Voice Activity Detection (VAD) models for academic research and commercial applications.

Language: Python - Size: 27.3 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

pranjal-pravesh/stt-silero-whisper

Real-time speech to text using voice activity detection (with silero-VAD) and transcriptions using faster-whisper model

Language: Python - Size: 35.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aidayang/FunASR-OneClick

FunASR实时语音识别版，识别麦克风和电脑内播放的声音，电脑语音打字软件

Size: 22.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 0

Swanand-Wagh/Socraitive

Language: TypeScript - Size: 14 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 2

edyamza/Voice-Activity-Detection-WebRTC-Silero

This is a python project. We compare the metrics of 2 already trained AI models - WebRTC & Silero.

Language: Python - Size: 11.7 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ggeop/Python-ai-assistant

Python AI assistant 🧠

Language: Python - Size: 2.99 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 977 - Forks: 247

jim-schwoebel/voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

Language: Python - Size: 299 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 381 - Forks: 86

egorsmkv/marblenet-inference

Inference code for Frame MarbleNet (VAD from NeMo)

Language: Python - Size: 57.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

itmo-mbss-lab/sr_labs_book

The project is related to the development of labs for the ITMO Speaker Recognition Course.

Language: Jupyter Notebook - Size: 3.25 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 10 - Forks: 8

gooofy/py-nltools

A collection of basic python modules for spoken natural language processing

Language: Python - Size: 413 KB - Last synced at: 5 days ago - Pushed at: over 5 years ago - Stars: 56 - Forks: 15

sepnic/litevad

Speech-end detection library, based on WebRTC's VAD engine

Language: C - Size: 453 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 5

krithicswaroopan/AI-Voice-Assistance-Pipeline

A real-time voice-to-text and text-to-speech AI pipeline using Whisper, an LLM, and Edge-TTS with tunable parameters for low-latency audio processing and response generation.

Language: Python - Size: 80.2 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 4 - Forks: 1

RicherMans/GPV

Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper

Language: Python - Size: 8.85 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 142 - Forks: 29

ina-foss/InaGVAD

Voice activity detection and speaker gender segmentation audiovisual corpus

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 13 - Forks: 1

Ave-Sergeev/Dictator

Speech-to-Text translation service (Rust, Tonic) (2025)

Language: Rust - Size: 49.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

spokestack/react-native-spokestack 📦

Spokestack: give your React Native app a voice interface!

Language: TypeScript - Size: 6.52 MB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 61 - Forks: 13

spokestack/spokestack-android 📦

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Language: Java - Size: 1.25 MB - Last synced at: 24 days ago - Pushed at: almost 4 years ago - Stars: 74 - Forks: 10

sudydtdtgxdjchdyfghxyfgjcj/MLP-From-Scratch

A C++ implementation of a Multilayer Perceptron (MLP) neural network using Eigen, supporting multiple activation and loss functions, mini-batch gradient descent, and backpropagation for training.

Language: C++ - Size: 23.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

filippogiruzzi/voice_activity_detection

Voice Activity Detection based on Deep Learning & TensorFlow

Language: Python - Size: 238 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 363 - Forks: 69

nicklashansen/voice-activity-detection

Voice Activity Detection (VAD) using deep learning.

Language: Jupyter Notebook - Size: 2.41 MB - Last synced at: 3 months ago - Pushed at: almost 6 years ago - Stars: 196 - Forks: 33

nosoy77/logitech_bcc950

A talking eyeball on a stick - Logitech BCC950 PTZ camera control scripts

Language: Python - Size: 14.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

xaionaro-go/audio

A package for Go to playback, record and process audio

Language: Go - Size: 168 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

mvalancy-mt/logitech_bcc950

A talking eyeball on a stick - Logitech BCC950 PTZ camera control scripts

Language: Python - Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

KarthikaRajagopal44/Beyond-Voice-Activity-Detection-Advanced-Turn-End-Prediction-in-Conversational-Agents

Moving Past VAD: Smarter Turn-Taking in Voice Assistants

Language: Python - Size: 1.95 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

webyneter/speech-to-console

A voice-controlled tool that converts spoken commands to text in your terminal

Language: Python - Size: 267 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

egorsmkv/pyannote-onnx-rust

Run Voice Activity Detection model PyAnnote using ONNX and Rust

Language: Rust - Size: 5.17 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jtkim-kaist/VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

Language: MATLAB - Size: 261 MB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 854 - Forks: 234

BingLingGroup/autosub Fork of iWangJiaxiang/autosub

Command-line utility to transcribe/translate from video/audio/subtitles to subtitles

Language: Python - Size: 1.29 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1,986 - Forks: 245

Related Keywords

voice-activity-detection 169 vad 50 speech-recognition 37 voice-recognition 29 voice-assistant 23 speech-to-text 23 speech-processing 22 voice 20 python 20 machine-learning 19 voice-control 19 deep-learning 18 speech 18 pytorch 17 voice-commands 17 audio-processing 14 audio 14 asr 12 speech-activity-detection 10 whisper 10 silero-vad 10 voice-chat 9 text-to-speech 9 speaker-diarization 9 speech-detection 8 webrtc 8 voice-synthesis 8 speaker-recognition 8 stt 7 tts 6 real-time 6 silero 6 android 6 speech-segmentation 6 tensorflow 6 voice-detection 6 speaker-identification 5 voice-computing 5 transcription 5 speaker-verification 5 neural-networks 5 ai 5 automatic-speech-recognition 5 openai 5 deep-neural-networks 5 subtitles 5 voice-activity-detector 4 natural-language-processing 4 wake-word-detection 4 onnxruntime 4 wakeword 4 voice-conversion 4 speech-api 4 cpp 4 onnx 4 audio-segmentation 4 speech-synthesis 4 cnn 4 mfcc 4 hacktoberfest 4 ios 4 speaker-embedding 4 rust 3 offline 3 dataset 3 voice-changer-download 3 gender-classification 3 acoustic-features 3 audio-analysis 3 pulseaudio 3 data 3 conversational-ai 3 speech-analysis 3 faster-whisper 3 ovos 3 lstm 3 fastapi 3 mlp 3 speech-emotion-recognition 3 csharp 3 transformer 3 c 3 noise-robust 3 signal-processing 3 mfcc-features 3 voice-changer 3 keras 3 dnn 3 gmm 3 embodied-ai 2 logitech 2 noise-cancellation 2 gender 2 camera 2 wakeword-activation 2 ptz 2 sound 2 gender-equality 2 data-science 2 libfvad 2