speech-to-text | Topic | Ecosyste.ms: Repos

Topic: "speech-to-text"

Picovoice/leopard

On-device speech-to-text engine powered by deep learning

Language: Python - Size: 419 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 454 - Forks: 28

edenai/edenai-apis

Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

Language: Python - Size: 158 MB - Last synced at: about 10 hours ago - Pushed at: about 11 hours ago - Stars: 448 - Forks: 67

jonatasgrosman/huggingsound

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Language: Python - Size: 598 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 447 - Forks: 45

AdolfVonKleist/Phonetisaurus

Phonetisaurus G2P

Language: Shell - Size: 2.24 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 440 - Forks: 122

toverainc/willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

Language: Python - Size: 3.27 MB - Last synced at: 6 days ago - Pushed at: 12 months ago - Stars: 440 - Forks: 47

ccoreilly/vosk-browser

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk

Language: JavaScript - Size: 707 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 423 - Forks: 66

OpenNewsLabs/autoEdit_2

Fast text based video editing, node Electron Os X desktop app, with Backbone front end.

Language: JavaScript - Size: 111 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 421 - Forks: 56

double22a/speech_dataset

The dataset of Speech Recognition

Size: 74.2 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 413 - Forks: 77

DeutscheKI/tevr-asr-tool

State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.

Language: C - Size: 289 KB - Last synced at: 30 days ago - Pushed at: almost 3 years ago - Stars: 413 - Forks: 18

VolcanicArts/VRCOSC

Modular OSC program creator, toolkit, and router made for VRChat. Show your heartrate, time, hardware stats, speech to text, control Spotify, and more! Includes drag-and-drop prefabs for your avatar.

Language: C# - Size: 8.49 MB - Last synced at: 1 day ago - Pushed at: 22 days ago - Stars: 408 - Forks: 31

haydenbleasel/orate

The AI toolkit for speech.

Language: TypeScript - Size: 4.49 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 407 - Forks: 23

ElishaAz/Sayboard

An open-source on-device voice IME (keyboard) for Android using the Vosk library.

Language: Kotlin - Size: 271 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 406 - Forks: 24

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Language: Jupyter Notebook - Size: 1.16 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 406 - Forks: 49

rtzr/Awesome-Korean-Speech-Recognition

한국어 음성인식 STT API 리스트. 각 성능 벤치마크.

Size: 86.9 KB - Last synced at: 11 days ago - Pushed at: 19 days ago - Stars: 403 - Forks: 22

revdotcom/reverb

Open source inference code for Rev's model

Language: Python - Size: 507 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 399 - Forks: 25

modelscope/FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language: Python - Size: 1.46 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 396 - Forks: 33

ArthurFDLR/whisper-youtube

🔉 Youtube Videos Transcription with OpenAI's Whisper

Language: Jupyter Notebook - Size: 124 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 393 - Forks: 114

bugbakery/transcribee

open source audio and video transcription software

Language: TypeScript - Size: 5.25 MB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 392 - Forks: 27

egorsmkv/speech-recognition-uk

🇺🇦 Speech Recognition & Synthesis for Ukrainian

Language: Python - Size: 2.42 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 385 - Forks: 22

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

Language: Python - Size: 1.21 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 368 - Forks: 42

speechbrain/speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Language: HTML - Size: 46.8 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 365 - Forks: 29

echogarden-project/echogarden

Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice isolation, language detection and more.

Language: TypeScript - Size: 1.72 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 359 - Forks: 39

mailong25/self-supervised-speech-recognition

speech to text with self-supervised learning based on wav2vec 2.0 framework

Language: Python - Size: 13.1 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 359 - Forks: 113

alphacep/vosk

VOSK Speech Recognition Toolkit

Language: C - Size: 42 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 355 - Forks: 43

oliverguhr/wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Language: Python - Size: 2.84 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 348 - Forks: 56

Nikorasu/LiveWhisper

A nearly-live implementation of OpenAI's Whisper, using sounddevice. Requires existing Whisper install.

Language: Python - Size: 54.7 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 345 - Forks: 47

daanzu/kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Language: Python - Size: 579 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 343 - Forks: 51

Xewdy444/Playwright-reCAPTCHA

A Python library for solving reCAPTCHA v2 and v3 with Playwright

Language: Python - Size: 440 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 341 - Forks: 43

Carleslc/AudioToText

Transcribe and translate audio to text using Whisper and DeepL.

Language: Jupyter Notebook - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 323 - Forks: 44

SergeyShk/Speech-to-Text-Russian

Проект для распознавания речи на русском языке на основе pykaldi.

Language: Python - Size: 72.5 MB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 322 - Forks: 54

Renovamen/Speech-and-Text

Speech to text (PocketSphinx, Iflytex API, Baidu API) and text to speech (pyttsx3) | 语音转文字（PocketSphinx、百度 API、科大讯飞 API）和文字转语音（pyttsx3）

Language: Python - Size: 63.5 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 319 - Forks: 75

yohasebe/openai-chat-api-workflow

🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image generation/editing/understanding 🖼️, speech-to-text conversion 🎤, and text-to-speech synthesis 🔈

Size: 113 MB - Last synced at: 2 days ago - Pushed at: 16 days ago - Stars: 315 - Forks: 9

hirofumi0810/tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Language: Python - Size: 4.17 MB - Last synced at: 6 months ago - Pushed at: over 7 years ago - Stars: 313 - Forks: 120

Adri6336/gpt-voice-conversation-chatbot

Allows you to have an engaging and safely emotive spoken / CLI conversation with the AI ChatGPT / GPT-4 while giving you the option to let it remember things discussed.

Language: Python - Size: 2.92 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 305 - Forks: 48

deepgram/deepgram-python-sdk

Official Python SDK for Deepgram.

Language: Python - Size: 16 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 304 - Forks: 82

Migushthe2nd/MsEdgeTTS

A simple Azure Speech Service module that uses the Microsoft Edge Read Aloud API

Language: TypeScript - Size: 265 KB - Last synced at: 29 days ago - Pushed at: 4 months ago - Stars: 302 - Forks: 44

NsLearning/LangHelper

Striving to create a great Application with full functions of learning languages by ChatGPT, TTS, STT and other awesome AI models, supports talking, speaking assessment, memorizing words with contexts, Listening test, so on.

Language: Rust - Size: 31.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 297 - Forks: 21

gtreshchev/RuntimeSpeechRecognizer 📦

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Language: C++ - Size: 24.8 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 292 - Forks: 45

b4rtaz/voice-assistant

Voice assistant for Visual Studio Code.

Language: TypeScript - Size: 723 KB - Last synced at: 30 days ago - Pushed at: almost 4 years ago - Stars: 292 - Forks: 11

theblackcat102/edgedict

Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

Language: Python - Size: 5.53 MB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 291 - Forks: 44

Kabanosk/whisper-website

Simple web application, which can be used to convert audio to subtitles by OpenAI's Whisper model

Language: Python - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 288 - Forks: 71

nanihadesuka/NovelDokusha

Android web novel reader

Language: Kotlin - Size: 10.4 MB - Last synced at: 23 days ago - Pushed at: about 1 month ago - Stars: 284 - Forks: 19

Olney1/ChatGPT-OpenAI-Smart-Speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

Language: Python - Size: 145 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 281 - Forks: 31

Kaljurand/K6nele

An Android app that offers speech-to-text user interfaces to other apps

Language: Java - Size: 24.5 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 280 - Forks: 83

Thiagohgl/ai-pronunciation-trainer

This tool uses AI to evaluate your pronunciation.

Language: Python - Size: 2.04 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 276 - Forks: 77

haoheliu/voicefixer_main

General Speech Restoration

Language: Python - Size: 21.5 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 276 - Forks: 56

cyberofficial/Synthalingua

Synthalingua - Real Time Translation

Language: Python - Size: 1.41 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 274 - Forks: 19

NaomiProject/Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!

Language: Python - Size: 5.27 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 274 - Forks: 60

SamirPaulb/real-time-voice-translator

A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.

Language: Tcl - Size: 248 MB - Last synced at: about 3 hours ago - Pushed at: over 1 year ago - Stars: 274 - Forks: 72

jim60105/docker-whisperX

Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)

Language: Dockerfile - Size: 368 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 273 - Forks: 37

mmpneo/curses

Speech to Text and KB input captions for OBS, VRChat, Twitch chat and Discord

Language: TypeScript - Size: 2.32 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 261 - Forks: 23

algolia/voice-overlay-android

🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI

Language: Kotlin - Size: 38.5 MB - Last synced at: 18 days ago - Pushed at: about 3 years ago - Stars: 258 - Forks: 36

asticode/go-astibob

Golang framework to build an AI that can understand and speak back to you, and everything else you want

Language: Go - Size: 1.54 MB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 244 - Forks: 20

robmsmt/KerasDeepSpeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

Language: Python - Size: 150 MB - Last synced at: 3 days ago - Pushed at: about 7 years ago - Stars: 242 - Forks: 76

pythonlessons/mltu

Machine Learning Training Utilities (for TensorFlow and PyTorch)

Language: Python - Size: 1.98 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 235 - Forks: 134

MikeyParton/react-speech-kit

React hooks for Speech Recognition and Speech Synthesis

Language: JavaScript - Size: 983 KB - Last synced at: 10 months ago - Pushed at: almost 2 years ago - Stars: 232 - Forks: 63

nikdanilov/whisper-obsidian-plugin

Speech-to-text in Obsidian using OpenAI Whisper

Language: TypeScript - Size: 236 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 228 - Forks: 32

shenasa-ai/speech2text

A Deep-Learning-Based Persian Speech Recognition System

Language: Jupyter Notebook - Size: 21.9 MB - Last synced at: about 6 hours ago - Pushed at: almost 2 years ago - Stars: 224 - Forks: 30

rolczynski/Automatic-Speech-Recognition 📦

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

Language: Python - Size: 3.6 MB - Last synced at: 24 days ago - Pushed at: almost 5 years ago - Stars: 224 - Forks: 63

tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

Language: Python - Size: 169 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 220 - Forks: 15

rakeshvar/rnn_ctc

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

Language: Python - Size: 604 KB - Last synced at: 6 months ago - Pushed at: almost 9 years ago - Stars: 220 - Forks: 80

Kaljurand/dictate.js

A small Javascript library for browser-based real-time speech recognition, which uses Recorderjs for audio capture, and a WebSocket connection to the Kaldi GStreamer server for speech recognition.

Language: JavaScript - Size: 228 KB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 217 - Forks: 62

Picovoice/web-voice-processor

A library for real-time voice processing in web browsers

Language: TypeScript - Size: 2.59 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 215 - Forks: 22

MainRo/deepspeech-server

A testing server for a speech to text service based on coqui.ai

Language: Python - Size: 80.1 KB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 215 - Forks: 71

pszemraj/vid2cleantxt

Python API & command-line tool to easily transcribe speech-based video files into clean text

Language: Jupyter Notebook - Size: 723 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 212 - Forks: 29

alphacep/awesome-russian-speech

Russian speech technology links

Size: 120 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 209 - Forks: 14

robmsmt/ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Language: Shell - Size: 25.4 KB - Last synced at: 3 days ago - Pushed at: almost 4 years ago - Stars: 209 - Forks: 22

JosefAlbers/whisper-turbo-mlx

Blazing fast whisper turbo for ASR (speech-to-text) tasks

Language: Python - Size: 527 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 202 - Forks: 9

jmaczan/gdansk-ai

Full stack voice chatbot

Language: TypeScript - Size: 2.98 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 197 - Forks: 21

Kyubyong/expressive_tacotron

Tensorflow Implementation of Expressive Tacotron

Language: Python - Size: 5.8 MB - Last synced at: 19 days ago - Pushed at: over 6 years ago - Stars: 196 - Forks: 34

HenestrosaDev/audiotext

A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.

Language: Python - Size: 80.5 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 194 - Forks: 18

bricewalker/Hey-Jetson

Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.

Language: Jupyter Notebook - Size: 2.88 GB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 192 - Forks: 40

ssheng/BentoChain

A voice-enabled chatbot application built using of 🦜️🔗 LangChain, text-to-speech, and speech-to-text models from 🤗 Hugging Face, and 🍱 BentoML.

Language: Python - Size: 4.63 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 192 - Forks: 24

deepgram/deepgram-js-sdk

Official JavaScript SDK for Deepgram.

Language: TypeScript - Size: 22.4 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 189 - Forks: 68

jofizcd/Soul-of-Waifu

Discover the world of artificial intelligence and interact with your favorite characters without needing to learn tons of information. Bring your Waifu to life with Soul of Waifu!

Language: Python - Size: 26.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 188 - Forks: 15

balavenkatesh3322/audio-pretrained-model

A collection of Audio and Speech pre-trained models.

Size: 134 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 188 - Forks: 26

JigsawStack/insanely-fast-whisper-api

An API to transcribe audio with OpenAI's Whisper Large v3!

Language: Python - Size: 250 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 185 - Forks: 27

felixbade/transcribe

Web UI for OpenAI Whisper API

Language: HTML - Size: 29.3 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 184 - Forks: 28

Kyubyong/speaker_adapted_tts

Making a TTS model with 1 minute of speech samples within 10 minutes

Size: 5.86 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 184 - Forks: 17

deepgram-starters/nextjs-live-transcription

Get started using Deepgram's Live Transcription with this Next.js demo app

Language: TypeScript - Size: 265 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 182 - Forks: 218

alexruperez/SpeechRecognizerButton

UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.

Language: Swift - Size: 580 KB - Last synced at: 3 days ago - Pushed at: over 5 years ago - Stars: 182 - Forks: 35

maxent-ai/converse

Conversational text Analysis using various NLP techniques

Language: Jupyter Notebook - Size: 154 KB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 181 - Forks: 19

MacKey-255/GoodByeCatpcha Fork of mikeyy/nonoCAPTCHA

An asynchronized Python library to automate solving ReCAPTCHA v2 using audio and image recognition

Language: Python - Size: 117 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 180 - Forks: 56

asticode/go-astideepspeech

Golang bindings for Mozilla's DeepSpeech speech-to-text library

Language: Go - Size: 35.2 KB - Last synced at: 11 months ago - Pushed at: over 3 years ago - Stars: 175 - Forks: 23

misyaguziya/VRCT

VRCT(VRChat Chatbox Translator & Transcription)

Language: JavaScript - Size: 58.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 174 - Forks: 18

sonngdev/chatgpt-voice

Have a conversation with ChatGPT. Casually 🔈 🤖 ⚡️

Language: TypeScript - Size: 872 KB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 173 - Forks: 41

lucoiso/UEAzSpeech

This plugin integrates Azure Speech Cognitive Services in Unreal Engine.

Language: C++ - Size: 161 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 172 - Forks: 39

smoke-trees/Voice-synthesis

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Language: Python - Size: 3.12 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 170 - Forks: 46