GitHub topics: wav2vec2
VGD3626/English_Accent_Detection Fork of Divyang029/English_Accent_Detection
Audio classification using transfer learning-based approach
Language: Jupyter Notebook - Size: 6.52 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

yt-bot-boop/Persian-Voice-Command-System
# Persian Voice Command System## DescriptionThe **Persian Voice Command System** lets Persian speakers control their Windows computers using voice commands. With a user-friendly interface and local AI models, this project ensures accurate transcription while maintaining user privacy. 🖥️🎤
Language: Python - Size: 9.77 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Common-Voice-Gender-Detection
Speech-Emotion-Classification is a fine-tuned version of facebook/wav2vec2-base-960h for multi-class audio classification, specifically trained to detect emotions in speech. This model utilizes the Wav2Vec2ForSequenceClassification architecture to accurately classify speaker emotions from audio signals.
Language: Python - Size: 11.7 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

hyunnnchoi/tethys-speech
tf2 implementation of whisper & wav2vec2 models w/ distributed training for k8s/kubeflow
Language: Python - Size: 285 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

tuanio/noisy-student-training-asr
Pytorch implementation of Noisy Student Training for Automatic Speech Recognition and Automatic Pronunciation Error Detection problem
Language: Python - Size: 3.08 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 93 - Forks: 15

pszemraj/vid2cleantxt
Python API & command-line tool to easily transcribe speech-based video files into clean text
Language: Jupyter Notebook - Size: 723 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 213 - Forks: 29

faizaliyaqat/Speech-emotion-recognition
Speech Emotion Recognition using Wav2Vec 2.0 + Random Forest Real-time emotion detection system built with Streamlit, trained on RAVDESS and SAVEE datasets using Wav2Vec 2.0 features and a Random Forest classifier. Includes SHAP explainability and audio waveform visualization.
Language: Python - Size: 17.6 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language: Python - Size: 69.2 MB - Last synced at: 19 days ago - Pushed at: 22 days ago - Stars: 11,899 - Forks: 1,912

aryanxxvii/lark
Speech Assessment API in FastAPI with HuggingFace 🤗
Language: JavaScript - Size: 184 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 11 - Forks: 0

mokshhhhh/AudioCaptchaRecognizer
A conversational AI : Speech synthesis project where we develop and use a model to identify audio captcha often seen in websites' human verification.
Language: Python - Size: 13.6 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Language: Python - Size: 135 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 2,396 - Forks: 501

egorsmkv/speech-to-text-using-php
Use PHP for Speech-to-Text task. Just a research.
Language: PHP - Size: 289 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

lakshiitakalyanasundaram/DeepSonic
DeepFake Audio detection project using Wav2Vec2 for MOMENTA (Task for internship )
Language: Python - Size: 64.3 MB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

slinusc/speaker_identification_evaluation
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
Language: Jupyter Notebook - Size: 8.56 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 1

yamahigashi/Wav2Vec2FBX
Recognize speech from an audio file and convert it into animation FBX
Language: Python - Size: 199 KB - Last synced at: 1 day ago - Pushed at: over 3 years ago - Stars: 21 - Forks: 3

khanld/ASR-Wav2vec-Finetune
:zap: Finetune Wa2vec 2.0 For Speech Recognition
Language: Python - Size: 5.1 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 127 - Forks: 28

ttop32/wav2vec2-live-japanese-translator
real time japanese speech recognition translator using wav2vec2
Language: Jupyter Notebook - Size: 926 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 38 - Forks: 3

oliverguhr/wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Language: Python - Size: 2.84 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 348 - Forks: 56

vietai/ASR
End-to-End Vietnamese Speech Recognition using wav2vec 2.0
Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 98 - Forks: 9

mahshid1378/ASR-Wav2vec-Finetune
⚡ Finetune Wa2vec 2.0 For Speech Recognition
Language: Python - Size: 5.01 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

notAI-tech/IndicASR
Speeech Recognition for Indic languages.
Language: Python - Size: 623 KB - Last synced at: about 14 hours ago - Pushed at: about 4 years ago - Stars: 13 - Forks: 3

pooya-mohammadi/audio-classification-pytorch
In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.
Language: Jupyter Notebook - Size: 871 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 41 - Forks: 4

jmaczan/asr-dysarthria
Research on Automatic Speech Recognition for dysarthric speech
Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 11 - Forks: 2

egorsmkv/w2v2-bert-aligner
Aligner for wav2vec2-bert models
Language: Python - Size: 1.95 KB - Last synced at: 27 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

navalnica/wav2vec2-belarusian
Speech to Text model for Belarusian language
Language: Jupyter Notebook - Size: 1.37 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

lucasgris/wav2vec4bp
Wav2vec resources and models for Brazilian Portuguese
Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 2

louisbrulenaudet/balena
BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.
Language: Python - Size: 229 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

khanld/Wav2vec2-Pretraining
Wav2vec 2.0 Self-Supervised Pretraining
Language: Python - Size: 303 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 4

jp1924/ASR
🤗ASR 학습시키기 위한 코드
Language: Python - Size: 413 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

moncefbenaicha/spoken-ner
Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning
Language: Python - Size: 113 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

skit-ai/Map-Mix
The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023)
Size: 18.1 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 1

vectominist/MiniASR
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
Language: Jupyter Notebook - Size: 342 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 50 - Forks: 6

habla-liaa/ser-with-w2v2
Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'
Language: Jupyter Notebook - Size: 32.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 128 - Forks: 23

moxeeem/ASR-pronunciation-correction
Этот проект представляет систему автоматической коррекции произношения на английском языке с использованием нейронной сети wav2vec2.
Language: Jupyter Notebook - Size: 9.17 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

Nightey3s/Speech-Emotion-Recognition-using-Wav2Vec2
A Speech Emotion Recognition (SER) system using Facebook's Wav2Vec2 model that classifies speech into four emotions (Neutral, Happy, Sad, Angry). Achieves 69.02% accuracy on IEMOCAP dataset using modern transformer architecture and comprehensive data augmentation techniques.
Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

nhut-ngnn/Multimodal-Speech-Emotion-Recognition
A multimodal SER project combining BERT and ECAPA-TDNN with cross-attention-based fusion on the IEMOCAP dataset.
Language: Python - Size: 7.07 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

thiagogre/mimicking
English Pronunciation Improvement App
Language: Python - Size: 852 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Not-ML/audio-ml
Standalone Audio ML Application: An innovative Python-based tool integrating Speech Recognition (ASR), Sentiment Analysis (NLP), and Text-to-Speech (TTS) to process audio, analyze sentiment, and generate spoken responses. Features both command-line and GUI interfaces for seamless interaction.
Language: Python - Size: 20.5 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

seanghay/kfa
A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus
Language: Python - Size: 10.1 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

sugarcane-mk/finetuning_wav2vec2
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
Language: Jupyter Notebook - Size: 42 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

aitor-alvarez/acoustic-transformer-models
Acoustic Transformer Models for Audio Classification
Language: Python - Size: 51.8 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

imsanjoykb/Speech-NLP-Bootcamp
Speech NLP Bootcamp
Language: Jupyter Notebook - Size: 3.1 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

Msparihar/Transcriber
Developed an AI tool to automatically generate captions and transcripts for YouTube videos in 67 languages and can generate summarized texts in 133 languages.
Language: Python - Size: 14.6 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

kingabzpro/WOLOF-ASR-Wav2Vec2
Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.
Language: Jupyter Notebook - Size: 3.34 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 8

FernandoLpz/SpeechRecognition
This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.
Language: Python - Size: 118 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

kardSIM/audio2img
Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model
Language: Jupyter Notebook - Size: 29.8 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 1

aitor-alvarez/large-speech-models
Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper
Language: Python - Size: 84 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

dangrebenkin/wav2vec2_speech_markuper
Automatic generation of speech dataset markup using Wav2Vec2 ASR models
Language: Python - Size: 396 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

piedeboer96/Digital-Assistant-Audio-Processing
Project 2.2 - Speech Recognition and Speaker Identification
Language: Java - Size: 13 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

gulabpatel/Speech-to-Text
Language: Jupyter Notebook - Size: 1.64 MB - Last synced at: 1 day ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

egorsmkv/wav2vec2-hidet
A test to run w2v2 with hidet optimizer
Language: Python - Size: 402 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

parvatijay2901/Hindi-ASR-and-TTS
EC499: Major Project
Language: Shell - Size: 68.4 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 2

akash13s/audio-to-image Fork of rishavroy97/audio-to-image
Pipeline for generating images conditioned on input audio
Language: Python - Size: 3.11 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

JingleCate/SpeechEmotionRecog
A simple Speech Emotion Recognition (SER) project based on Wav2Vec2.
Language: Python - Size: 235 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

kamalesh003/NoiseCancellationTranscriptionModel
Noise Cancellation Transcription Model Using Wav2Vec2
Language: Jupyter Notebook - Size: 240 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

AmirAbaskohi/Automatic-Speech-recognition-for-Speech-Assessment-of-Persian-Preschool-Children
Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.
Language: Jupyter Notebook - Size: 1.23 MB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 21 - Forks: 1

Sarasadeghii/Sharif-Wav2vec2
This repo shows how to finetune the wav2vec2.0 model along with its prerequisites.
Language: Jupyter Notebook - Size: 297 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

inboxpraveen/LLM-Minutes-of-Meeting
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀
Language: Python - Size: 7.14 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 57 - Forks: 7

daanzu/wav2vec2_stt_python
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
Language: Python - Size: 88.9 KB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 24 - Forks: 3

tracyreuter/NLP-speech-to-text
Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper
Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sebinbenjamin/wav2vec_demo
A Python tool for transcribing speech from audio files using the Wav2Vec 2.0 model. Supports multilingual transcription, automatic audio chunking, and easy setup
Language: Python - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

wngh1187/IPET
Pytorch implementation of INTEGRATED PARAMETER-EFFICIENT TUNING FOR GENERAL-PURPOSE AUDIO MODELS
Language: Python - Size: 4.28 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

moncefbenaicha/SpokenNER
Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning
Language: Python - Size: 627 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

pradeepbatchu/speechtotext
Speech to Text with Wav2Vec2 using torchaudio
Language: Python - Size: 533 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

somosnlp/wav2vec2-spanish
Pre-train a Spanish Wav2Vec2 model using the Spanish portion of the Common Voice dataset.
Language: Python - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

seb5433/wav2vec2-speaker-recognition
Speaker recognition task using wav2vec2 model.
Language: Python - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ranchlai/wav2vec-2.0
Wav2vec2 English speech recognition in PaddlePaddle
Language: Python - Size: 316 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

Sreyan88/Toxicity-Detection-in-Spoken-Utterances
This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"
Language: Jupyter Notebook - Size: 976 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 5

PeterGilles/Speech-Recognition-Lecture---Data-Science-in-Humanities
Material for my lecture on Automatic Speech Recognition
Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

andrejanesic/Voice-Assistant
🤗 Voice assistant built in Python with NLP & wav2vec2.
Language: Python - Size: 36.7 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

audeering/w2v2-how-to
How to use our public wav2vec2 dimensional emotion model
Language: Jupyter Notebook - Size: 98.6 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 398 - Forks: 47

ECNU-Cross-Innovation-Lab/ENT
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
Language: Python - Size: 638 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

mikezzb/lyrics-sync
A deep learning lyrics-to-audio alignment system, generating synchronized lyrics from a song and its lyrics
Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

Sreyan88/Indic-ASR
Repository for pre-trained wav2vec 2.0 models on 7 Indian languages
Language: Python - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

zhu00121/Universal-representation-dynamics-of-deepfake-speech
This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"
Language: Python - Size: 339 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

egorsmkv/asr-corpus-creator 📦
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
Language: Python - Size: 2.47 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 3

nisheethjaiswal/Speech-to-Text
Speech to text implementation using transformers in PyTorch.
Language: Jupyter Notebook - Size: 333 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

RubensZimbres/Repo-2022
Python codes on PyTorch, Tensorflow, Keras, Wav2Vec2 Fine-Tuning and Google Cloud
Language: Jupyter Notebook - Size: 74.7 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 4

ShafakatArnob/Automatic-Bengali-Subtitle-Generation-Deep-Learning
Automatic Subtitle Generation for Bengali Multimedia Using Deep Learning.
Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RajGothi/Improving-Automatic-Speech-Recognition-with-Dialect-Specific-Language-Models
This repository contains the implementation of our published paper titled 'Improving Automatic Speech Recognition with Dialect-Specific Language Models,' presented at SPECOM'23.
Language: Jupyter Notebook - Size: 1000 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ECNU-Cross-Innovation-Lab/ShiftSER
[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
Language: Python - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 2

Dhruv16S/Transcribing-Video-to-Text
This repository is an implementation of the Wav2Vec2 model for converting speech into text through a series of speech recognition, noise removal and STT to transcribe the text from a video file.
Language: Python - Size: 13.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

agustyawan-arif/wav2vec2-large-xlsr-53-id
Performing audio transcription using the Wav2Vec2 model trained on the Common Voice dataset 13 for Indonesian.
Language: Python - Size: 5.18 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

kingabzpro/hindiSpeechPro-Automatic-Speech-Recognization Fork of SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization
The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.
Language: Jupyter Notebook - Size: 2.56 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization
The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.
Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

appledora/wav2vec2_scripts
A modular codebase to process audio dataset, generate custom tokenizer, finetune and infer wav2vec2 model on custom dataset.
Language: Python - Size: 35.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

oswaldoludwig/Pruning-pre-trained-models-using-evolutionary-computation
This repository contains scripts to prune Wav2vec2 using a neuroevolution-based method. More details about this method can be found in the paper Compressing Wav2vec2 for Embedded Applications.
Language: Shell - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hammaad2002/ASRAdversarialAttacks
An ASR (Automatic Speech Recognition) adversarial attack repository.
Language: Jupyter Notebook - Size: 10 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

JuJu2181/Automatic-Nepali-Speech-Recognition-and-Summarizer
A system capable of converting Nepali speech to text and generate summary of text
Language: Jupyter Notebook - Size: 288 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 2

fatou1526/ASR_wav2vec2
This repo contains codes about loading audio data, training wav2vec2 model with custom language dataset
Language: Jupyter Notebook - Size: 664 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

baocin/hugging_face_example_STT_api
Demonstration of Hugging Face's (https://huggingface.co/) newly released Wav2Vec2 model for easy, reasonably coherent, Speech to Text!
Language: Python - Size: 41.8 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

trinhtuanvubk/Wav2Vec2-Triton-Serving
Serve Wav2Vec2 model using Triton Inference Server
Language: Python - Size: 623 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Telegram-Zalo/zac2022-lyric-alignment
Solution for Zalo AI Challenge 2022 - Lyrics Alignment
Language: Python - Size: 949 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 61 - Forks: 18

sotiriskar/audio-note
Python application for taking audio notes and create summary of meetings.
Language: Python - Size: 887 KB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

trinhtuanvubk/speech-to-text-demo
Speech to Text demo with Wav2Vec2 model
Language: Python - Size: 2.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jvel07/wav2vec2_patho
Fine-tuning wav2vec2 to for Pathological Speech Processing
Language: Jupyter Notebook - Size: 4.05 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

TerboucheHacene/speech-keyword-spotting
Speech Keyword detection using Wav2Vec Model
Language: Python - Size: 314 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

lstrgar/self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
Language: Python - Size: 106 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 4

viksit-siddhant/compare2023
SER and audio classification using both a Wav2Vec2 based model and an ASR->Bert pipeline, as well as utilizing a multimodal late-fusion model
Language: Python - Size: 12.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

techiaith/docker-huggingface-stt-cy
Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace
Language: Python - Size: 321 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 4
