An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: voice-activity-detection

gbibbo/vad_benchmark

Privacy‑preserving VAD benchmark on domestic audio (CHiME‑Home): 8 models, accuracy vs efficiency.

Language: Python - Size: 18.4 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 0 - Forks: 0

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language: Jupyter Notebook - Size: 252 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 8,225 - Forks: 934

connorlyon10/MScProject

The repo for my MSc Data Science project, "Speech Processing in Real-Time with Convolutional Neural Networks".

Language: Jupyter Notebook - Size: 121 MB - Last synced at: about 18 hours ago - Pushed at: about 21 hours ago - Stars: 0 - Forks: 0

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language: Python - Size: 104 MB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 6,749 - Forks: 627

ricky0123/vad

Voice activity detector (VAD) for the browser with a simple API

Language: TypeScript - Size: 4.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,571 - Forks: 220

pmbstyle/EchoTap

Transcribe anything localy, fast and safe.

Language: Python - Size: 360 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

juanmc2005/diart

A python package to build AI-powered real-time audio applications

Language: Python - Size: 34.8 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 1,440 - Forks: 105

iamsrikanthnani/pluely

The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly during meetings, interviews, and conversations without anyone knowing. Built with Tauri for native performance, just 10MB. Completely undetectable in video calls, screen shares, and recordings.

Language: TypeScript - Size: 113 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 551 - Forks: 65

stefanwebb/open-voice-activity-detection

Fully open-source and state-of-the-art Voice Activity Detection (VAD) models for academic research and commercial applications.

Language: Python - Size: 27.3 KB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

Language: C++ - Size: 2.11 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,474 - Forks: 194

zhenghuatan/rVADfast

This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.

Language: Python - Size: 3.63 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 145 - Forks: 24

smacke/ffsubsync

Automagically synchronize subtitles with video.

Language: Python - Size: 3.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 7,332 - Forks: 297

baxtree/subaligner

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

Language: Python - Size: 103 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 483 - Forks: 20

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

Language: Go - Size: 5.87 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 9,814 - Forks: 242

FluidInference/FluidAudio

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Language: Swift - Size: 13.7 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 560 - Forks: 65

dangvansam/livekit-plugins-tenvad

LiveKit plugin for TEN VAD: low-latency voice activity detection for real-time streaming, integrated with livekit-agents

Language: Python - Size: 3.45 MB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 3 - Forks: 1

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language: Python - Size: 100 MB - Last synced at: 7 days ago - Pushed at: 22 days ago - Stars: 12,305 - Forks: 1,233

amsehili/auditok

An audio/acoustic activity detection and audio segmentation tool

Language: Python - Size: 3.68 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 798 - Forks: 98

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

Language: C++ - Size: 1.94 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 81 - Forks: 19

pykeio/earshot

Ridiculously fast voice activity detection in pure #[no_std] Rust

Language: Rust - Size: 879 KB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 22 - Forks: 1

pmbstyle/Alice

Alice is a smart desktop AI assistant application built with Vue.js, Vite, and Electron. Advanced memory system, function calling, MCP support and more.

Language: TypeScript - Size: 79.5 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 162 - Forks: 17

techAli1996/wakeword

ESP32S3 Wakeword/Keyword Spotting starter project with ready to go ML model

Language: C - Size: 4.68 MB - Last synced at: about 12 hours ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

TEN-framework/ten-vad

Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight

Language: C - Size: 9.6 MB - Last synced at: 11 days ago - Pushed at: 26 days ago - Stars: 1,350 - Forks: 114

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

Language: HTML - Size: 86.9 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 76 - Forks: 10

ina-foss/inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

Language: Python - Size: 36.6 MB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 832 - Forks: 141

PranavMishra17/Streaming-Digit-Detector

A real-time audio digit classification system that recognizes spoken numbers (0-9) through live microphone streaming. Features multiple classification approaches including TTS APIs, Fourier analysis, MFCC, and MEL features with performance benchmarking and inference time tracking.

Language: Python - Size: 279 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

bigcash/awesome-vad

A curated list of awesome voice activity detection

Size: 9.77 KB - Last synced at: about 24 hours ago - Pushed at: 10 months ago - Stars: 62 - Forks: 3

Picovoice/cobra

On-device voice activity detection (VAD) powered by deep learning

Language: Python - Size: 43.2 MB - Last synced at: 15 days ago - Pushed at: 24 days ago - Stars: 227 - Forks: 14

spokestack/react-native-spokestack 📦

Spokestack: give your React Native app a voice interface!

Language: TypeScript - Size: 6.52 MB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 62 - Forks: 13

RimAmarat/RealTimeSpeechRec

Real Time Speech Recognition with Voice Activity Detection using Pytorch

Language: Python - Size: 37.1 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

bincrafters/conan-libfvad

Conan.io package for libfvad project

Language: Python - Size: 33.2 KB - Last synced at: 16 days ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

Paradeluxe/Praditor

Praditor: A DBSCAN-Based Automation for Speech Onset Detection

Language: Python - Size: 195 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

RustedBytes/pyannote-onnx-rust

Run Voice Activity Detection model PyAnnote using ONNX and Rust

Language: Rust - Size: 5.17 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

gtreshchev/RuntimeAudioImporter 📦

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

Language: C++ - Size: 10.1 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 392 - Forks: 82

gooofy/py-nltools

A collection of basic python modules for spoken natural language processing

Language: Python - Size: 413 KB - Last synced at: 11 days ago - Pushed at: almost 6 years ago - Stars: 55 - Forks: 15

baochuquan/ios-vad

iOS Voice Activity Detection (VAD). Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

Language: Swift - Size: 4.5 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 21 - Forks: 2

jim-schwoebel/voice_gender_detection

♂️♀️ Detect a person's gender from a voice file (90.7% +/- 1.3% accuracy).

Language: Python - Size: 9.57 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 87 - Forks: 25

nianlonggu/WhisperSeg

Code for ICASSP 2024 paper WhisperSeg: Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection

Language: Python - Size: 243 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 33 - Forks: 12

Saga9103/t2yLLM

A voice assistant with local LLM as a backend

Language: Python - Size: 213 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

spokestack/spokestack-ios 📦

Spokestack: give your iOS app a voice interface!

Language: Swift - Size: 9.94 MB - Last synced at: 26 days ago - Pushed at: about 4 years ago - Stars: 45 - Forks: 9

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

Language: C - Size: 5.16 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 374 - Forks: 79

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Language: Jupyter Notebook - Size: 1.16 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 435 - Forks: 59

daanzu/py-silero-vad-lite

Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies

Language: Python - Size: 1.9 MB - Last synced at: about 11 hours ago - Pushed at: 9 months ago - Stars: 15 - Forks: 1

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

Language: Jupyter Notebook - Size: 254 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 97 - Forks: 16

tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

Language: Python - Size: 169 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 226 - Forks: 16

OpenVoiceOS/ovos-vad-plugin-silero

ovos plugin for voice activity detection using silero vad

Language: Python - Size: 1.81 MB - Last synced at: about 7 hours ago - Pushed at: about 9 hours ago - Stars: 0 - Forks: 2

OpenVoiceOS/ovos-vad-plugin-webrtcvad

ovos plugin for voice activity detection using webrtcvad

Language: Python - Size: 27.3 KB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

thurti/vad-audio-worklet

Voice Activity Detection (VAD) AudioWorklet

Language: JavaScript - Size: 762 KB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 5

ZygoteCode/VadSharp

Enterprise VAD (Voice Activity Detection) in C#.NET (.NET 6.0+) with Microsoft.ML.Net, ONNXRuntime and DirectML. The easiest, efficient, and performant Silero VAD implementation! Always open for PRs.

Language: C# - Size: 354 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

pranjal-pravesh/stt-silero-whisper

Real-time speech to text using voice activity detection (with silero-VAD) and transcriptions using faster-whisper model

Language: Python - Size: 35.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

aidayang/FunASR-OneClick

FunASR实时语音识别版,识别麦克风和电脑内播放的声音,电脑语音打字软件

Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

Swanand-Wagh/Socraitive

Language: TypeScript - Size: 14 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 2

idiap/zff_vad

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Language: Python - Size: 631 KB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 2

edyamza/Voice-Activity-Detection-WebRTC-Silero

This is a python project. We compare the metrics of 2 already trained AI models - WebRTC & Silero.

Language: Python - Size: 11.7 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ggeop/Python-ai-assistant

Python AI assistant 🧠

Language: Python - Size: 2.99 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 977 - Forks: 247

jim-schwoebel/voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

Language: Python - Size: 299 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 381 - Forks: 86

egorsmkv/marblenet-inference

Inference code for Frame MarbleNet (VAD from NeMo)

Language: Python - Size: 57.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

itmo-mbss-lab/sr_labs_book

The project is related to the development of labs for the ITMO Speaker Recognition Course.

Language: Jupyter Notebook - Size: 3.25 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 8

sepnic/litevad

Speech-end detection library, based on WebRTC's VAD engine

Language: C - Size: 453 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 22 - Forks: 5

krithicswaroopan/AI-Voice-Assistance-Pipeline

A real-time voice-to-text and text-to-speech AI pipeline using Whisper, an LLM, and Edge-TTS with tunable parameters for low-latency audio processing and response generation.

Language: Python - Size: 80.2 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 4 - Forks: 1

RicherMans/GPV

Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper

Language: Python - Size: 8.85 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 142 - Forks: 29

ina-foss/InaGVAD

Voice activity detection and speaker gender segmentation audiovisual corpus

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 13 - Forks: 1

Ave-Sergeev/Dictator

Speech-to-Text translation service (Rust, Tonic) (2025)

Language: Rust - Size: 49.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 0

spokestack/spokestack-android 📦

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Language: Java - Size: 1.25 MB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 74 - Forks: 10

sudydtdtgxdjchdyfghxyfgjcj/MLP-From-Scratch

A C++ implementation of a Multilayer Perceptron (MLP) neural network using Eigen, supporting multiple activation and loss functions, mini-batch gradient descent, and backpropagation for training.

Language: C++ - Size: 23.4 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

filippogiruzzi/voice_activity_detection

Voice Activity Detection based on Deep Learning & TensorFlow

Language: Python - Size: 238 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 363 - Forks: 69

nicklashansen/voice-activity-detection

Voice Activity Detection (VAD) using deep learning.

Language: Jupyter Notebook - Size: 2.41 MB - Last synced at: 4 months ago - Pushed at: almost 6 years ago - Stars: 196 - Forks: 33

nosoy77/logitech_bcc950

A talking eyeball on a stick - Logitech BCC950 PTZ camera control scripts

Language: Python - Size: 14.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

xaionaro-go/audio

A package for Go to playback, record and process audio

Language: Go - Size: 168 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

mvalancy-mt/logitech_bcc950

A talking eyeball on a stick - Logitech BCC950 PTZ camera control scripts

Language: Python - Size: 15.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

KarthikaRajagopal44/Beyond-Voice-Activity-Detection-Advanced-Turn-End-Prediction-in-Conversational-Agents

Moving Past VAD: Smarter Turn-Taking in Voice Assistants

Language: Python - Size: 1.95 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

webyneter/speech-to-console

A voice-controlled tool that converts spoken commands to text in your terminal

Language: Python - Size: 267 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

jtkim-kaist/VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

Language: MATLAB - Size: 261 MB - Last synced at: 4 months ago - Pushed at: about 4 years ago - Stars: 854 - Forks: 234

BingLingGroup/autosub Fork of iWangJiaxiang/autosub

Command-line utility to transcribe/translate from video/audio/subtitles to subtitles

Language: Python - Size: 1.29 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 1,986 - Forks: 245

eesungkim/Voice_Activity_Detector

A statistical model-based Voice Activity Detection

Language: Jupyter Notebook - Size: 168 KB - Last synced at: 4 months ago - Pushed at: almost 7 years ago - Stars: 192 - Forks: 41

rohanprichard/fastrtc-demo

A simple POC of FastRTC, a framework to use voice mode in python!

Language: TypeScript - Size: 89.8 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 24 - Forks: 9

DictationDaddy/VAD_WEB_DEMO

In this repository, I show you how to use SILERO VAD with ONNX-WEB runtime to run the VAD compeletely in the browser.

Language: JavaScript - Size: 2 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 20 - Forks: 1

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Size: 139 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 1,318 - Forks: 142

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

Size: 136 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 1,875 - Forks: 237

jsvir/vad

[Tiny VAD] SG-VAD: Stochastic Gates Based Speech Activity Detection

Language: Python - Size: 1.71 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 26 - Forks: 3

bunyaminergen/Callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

Language: Python - Size: 23.9 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 65 - Forks: 10

numq/voice-activity-detection

JVM library for voice activity detection written in Kotlin based on C library fvad and Silero

Language: Kotlin - Size: 2.94 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

oadultradeepfield/healthhack-vad

A VAD service analyzing factors linked to cognitive decline, developed for the HealthHack 2025 project.

Size: 0 Bytes - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

ganlvtech/bing-stt

Rust implementation of bing "Search using voice" button speech recognition API (similar to Azure real-time speech to text API)

Language: Rust - Size: 17.6 KB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 3 - Forks: 1

alexnaughtonjr/Real-Time-Voice-Cloning Fork of CorentinJ/Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Size: 352 MB - Last synced at: 5 months ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 0

ElmiraGhorbani/gpt-speaker-diarization

Conversational Speaker Diarization using OpenAI AI Language Models(gpt-4) and OpenAI Whisper.

Language: Jupyter Notebook - Size: 39.1 KB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 0

panmasuo/voice-activity-detection

Voice activity detection algorithm written in C

Language: C - Size: 43.9 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 3

itmo-mbss-lab/sr_lectures_book

The project is related to the development of Basics of Voice Biometrics lecture book for the ITMO Speaker Recognition Course.

Language: TeX - Size: 1.28 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

bannawandoor27/Teletrix

Smart audio mixer that automatically mutes test tones when you speak - perfect for audio testing and development.

Language: Go - Size: 2.36 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

jim-schwoebel/pauses

🎤 quick library to extract pause lengths from audio files.

Language: Python - Size: 2.13 MB - Last synced at: 5 months ago - Pushed at: over 6 years ago - Stars: 31 - Forks: 7

nyumaya/libnyumaya_esp32

Experimental support for nyumaya audio recognition on ESP32

Language: C++ - Size: 5.35 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

HolgerBovbjerg/SSL-PVAD

A repository for code used to produce the results the ICASSP 2024 paper: "SELF-SUPERVISED PRETRAINING FOR ROBUST PERSONALIZED VOICE ACTIVITY DETECTION IN ADVERSE CONDITIONS"

Language: Python - Size: 5.3 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

kristofferv98/SemanthaVoiceAssistant

A comprehensive AI companion leveraging advanced semantic analysis, sentiment detection, and voice processing to provide personalized and context-aware interactions using Autogen, semantic-router, and VoiceProcessingToolkit.

Language: Python - Size: 85 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

NickWilkinson37/voxseg

A python library for voice activity detection (VAD) for speech/non-speech segmentation.

Language: Python - Size: 98.1 MB - Last synced at: 9 months ago - Pushed at: almost 3 years ago - Stars: 83 - Forks: 12

sooftware/End-to-End-Speech-Recognition-Models

PyTorch implementation of automatic speech recognition models.

Language: Python - Size: 84 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 38 - Forks: 5

mechanicalsea/spectra

Spectra extraction tutorials based on torch and torchaudio.

Language: Jupyter Notebook - Size: 3.31 MB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 40 - Forks: 4

IntendedConsequence/vadc

Uses the excellent silero VAD with onnxruntime C api for fast detection of audio segments with speech

Language: C++ - Size: 8.45 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

developerdaya/MyFirstJarvisApp

Experience the power of a personal assistant with MyFirstJarvisApp! Perform tasks, get information, and manage your daily life seamlessly with voice commands. Perfect for anyone looking to simplify their routine and harness AI technology.

Language: Kotlin - Size: 24.7 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Yifei-ZHAO96/Tr-VAD

Tr-VAD: An Efficient Transformer based Voice Activity Detection Model

Language: Python - Size: 5.55 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

binglel/vad_lrt_hmm

A statistical model-based Voice Activity Detector

Language: Python - Size: 725 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 1

Related Keywords