GitHub topics: wav2vec2
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language: Python - Size: 69.4 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 11,794 - Forks: 1,904

lakshiitakalyanasundaram/DeepSonic
DeepFake Audio detection project using Wav2Vec2 for MOMENTA (Task for internship )
Language: HTML - Size: 64.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

slinusc/speaker_identification_evaluation
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
Language: Jupyter Notebook - Size: 8.56 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

khanld/ASR-Wav2vec-Finetune
:zap: Finetune Wa2vec 2.0 For Speech Recognition
Language: Python - Size: 5.1 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 127 - Forks: 28

aryanxxvii/lark
Speech Assessment API in FastAPI with HuggingFace π€
Language: JavaScript - Size: 183 KB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

oliverguhr/wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Language: Python - Size: 2.84 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 348 - Forks: 56

pszemraj/vid2cleantxt
Python API & command-line tool to easily transcribe speech-based video files into clean text
Language: Jupyter Notebook - Size: 723 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 209 - Forks: 29

vietai/ASR
End-to-End Vietnamese Speech Recognition using wav2vec 2.0
Size: 10.7 KB - Last synced at: 17 days ago - Pushed at: over 3 years ago - Stars: 98 - Forks: 9

mahshid1378/ASR-Wav2vec-Finetune
β‘ Finetune Wa2vec 2.0 For Speech Recognition
Language: Python - Size: 5.01 MB - Last synced at: 13 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

notAI-tech/IndicASR
Speeech Recognition for Indic languages.
Language: Python - Size: 623 KB - Last synced at: 16 days ago - Pushed at: about 4 years ago - Stars: 13 - Forks: 3

pooya-mohammadi/audio-classification-pytorch
In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.
Language: Jupyter Notebook - Size: 871 KB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 41 - Forks: 4

ttop32/wav2vec2-live-japanese-translator
real time japanese speech recognition translator using wav2vec2
Language: Jupyter Notebook - Size: 926 KB - Last synced at: 16 days ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 3

jmaczan/asr-dysarthria
Research on Automatic Speech Recognition for dysarthric speech
Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 11 - Forks: 2

navalnica/wav2vec2-belarusian
Speech to Text model for Belarusian language
Language: Jupyter Notebook - Size: 1.37 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

lucasgris/wav2vec4bp
Wav2vec resources and models for Brazilian Portuguese
Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: 20 days ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 2

louisbrulenaudet/balena
BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.
Language: Python - Size: 229 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

khanld/Wav2vec2-Pretraining
Wav2vec 2.0 Self-Supervised Pretraining
Language: Python - Size: 303 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 4

jp1924/ASR
π€ASR νμ΅μν€κΈ° μν μ½λ
Language: Python - Size: 413 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

moncefbenaicha/spoken-ner
Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning
Language: Python - Size: 113 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

skit-ai/Map-Mix
The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023)
Size: 18.1 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 1

vectominist/MiniASR
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
Language: Jupyter Notebook - Size: 342 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 50 - Forks: 6

habla-liaa/ser-with-w2v2
Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'
Language: Jupyter Notebook - Size: 32.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 128 - Forks: 23

moxeeem/ASR-pronunciation-correction
ΠΡΠΎΡ ΠΏΡΠΎΠ΅ΠΊΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΠ΅Ρ ΡΠΈΡΡΠ΅ΠΌΡ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΊΠΎΡΡΠ΅ΠΊΡΠΈΠΈ ΠΏΡΠΎΠΈΠ·Π½ΠΎΡΠ΅Π½ΠΈΡ Π½Π° Π°Π½Π³Π»ΠΈΠΉΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅ Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΠΉ ΡΠ΅ΡΠΈ wav2vec2.
Language: Jupyter Notebook - Size: 9.17 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

egorsmkv/w2v2-bert-aligner
Aligner for wav2vec2-bert models
Language: Python - Size: 1.95 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Nightey3s/Speech-Emotion-Recognition-using-Wav2Vec2
A Speech Emotion Recognition (SER) system using Facebook's Wav2Vec2 model that classifies speech into four emotions (Neutral, Happy, Sad, Angry). Achieves 69.02% accuracy on IEMOCAP dataset using modern transformer architecture and comprehensive data augmentation techniques.
Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

nhut-ngnn/Multimodal-Speech-Emotion-Recognition
A multimodal SER project combining BERT and ECAPA-TDNN with cross-attention-based fusion on the IEMOCAP dataset.
Language: Python - Size: 7.07 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

thiagogre/mimicking
English Pronunciation Improvement App
Language: Python - Size: 852 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Not-ML/audio-ml
Standalone Audio ML Application: An innovative Python-based tool integrating Speech Recognition (ASR), Sentiment Analysis (NLP), and Text-to-Speech (TTS) to process audio, analyze sentiment, and generate spoken responses. Features both command-line and GUI interfaces for seamless interaction.
Language: Python - Size: 20.5 KB - Last synced at: 25 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

seanghay/kfa
A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus
Language: Python - Size: 10.1 MB - Last synced at: 14 days ago - Pushed at: 12 months ago - Stars: 5 - Forks: 0

sugarcane-mk/finetuning_wav2vec2
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
Language: Jupyter Notebook - Size: 42 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Language: Python - Size: 135 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2,253 - Forks: 484

aitor-alvarez/acoustic-transformer-models
Acoustic Transformer Models for Audio Classification
Language: Python - Size: 51.8 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

imsanjoykb/Speech-NLP-Bootcamp
Speech NLP Bootcamp
Language: Jupyter Notebook - Size: 3.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

Msparihar/Transcriber
Developed an AI tool to automatically generate captions and transcripts for YouTube videos in 67 languages and can generate summarized texts in 133 languages.
Language: Python - Size: 14.6 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

kingabzpro/WOLOF-ASR-Wav2Vec2
Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.
Language: Jupyter Notebook - Size: 3.34 MB - Last synced at: 22 days ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 8

FernandoLpz/SpeechRecognition
This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.
Language: Python - Size: 118 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

kardSIM/audio2img
Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model
Language: Jupyter Notebook - Size: 29.8 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 7 - Forks: 1

aitor-alvarez/large-speech-models
Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper
Language: Python - Size: 84 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

dangrebenkin/wav2vec2_speech_markuper
Automatic generation of speech dataset markup using Wav2Vec2 ASR models
Language: Python - Size: 396 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

piedeboer96/Digital-Assistant-Audio-Processing
Project 2.2 - Speech Recognition and Speaker Identification
Language: Java - Size: 13 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

gulabpatel/Speech-to-Text
Language: Jupyter Notebook - Size: 1.64 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

egorsmkv/wav2vec2-hidet
A test to run w2v2 with hidet optimizer
Language: Python - Size: 402 KB - Last synced at: 24 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

parvatijay2901/Hindi-ASR-and-TTS
EC499: Major Project
Language: Shell - Size: 68.4 KB - Last synced at: 17 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 2

akash13s/audio-to-image Fork of rishavroy97/audio-to-image
Pipeline for generating images conditioned on input audio
Language: Python - Size: 3.11 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

JingleCate/SpeechEmotionRecog
A simple Speech Emotion Recognition (SER) project based on Wav2Vec2.
Language: Python - Size: 235 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

kamalesh003/NoiseCancellationTranscriptionModel
Noise Cancellation Transcription Model Using Wav2Vec2
Language: Jupyter Notebook - Size: 240 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

AmirAbaskohi/Automatic-Speech-recognition-for-Speech-Assessment-of-Persian-Preschool-Children
Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.
Language: Jupyter Notebook - Size: 1.23 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 1

Sarasadeghii/Sharif-Wav2vec2
This repo shows how to finetune the wav2vec2.0 model along with its prerequisites.
Language: Jupyter Notebook - Size: 297 KB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

inboxpraveen/LLM-Minutes-of-Meeting
π€π An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! π
Language: Python - Size: 7.14 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 57 - Forks: 7

daanzu/wav2vec2_stt_python
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
Language: Python - Size: 88.9 KB - Last synced at: 1 day ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 3

tracyreuter/NLP-speech-to-text
Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper
Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

sebinbenjamin/wav2vec_demo
A Python tool for transcribing speech from audio files using the Wav2Vec 2.0 model. Supports multilingual transcription, automatic audio chunking, and easy setup
Language: Python - Size: 4.88 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

wngh1187/IPET
Pytorch implementation of INTEGRATED PARAMETER-EFFICIENT TUNING FOR GENERAL-PURPOSE AUDIO MODELS
Language: Python - Size: 4.28 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

moncefbenaicha/SpokenNER
Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning
Language: Python - Size: 627 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

pradeepbatchu/speechtotext
Speech to Text with Wav2Vec2 using torchaudio
Language: Python - Size: 533 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

somosnlp/wav2vec2-spanish
Pre-train a Spanish Wav2Vec2 model using the Spanish portion of the Common Voice dataset.
Language: Python - Size: 17.6 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

seb5433/wav2vec2-speaker-recognition
Speaker recognition task using wav2vec2 model.
Language: Python - Size: 21.5 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

ranchlai/wav2vec-2.0
Wav2vec2 English speech recognition in PaddlePaddle
Language: Python - Size: 316 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

Sreyan88/Toxicity-Detection-in-Spoken-Utterances
This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"
Language: Jupyter Notebook - Size: 976 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 5

PeterGilles/Speech-Recognition-Lecture---Data-Science-in-Humanities
Material for my lecture on Automatic Speech Recognition
Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

andrejanesic/Voice-Assistant
π€ Voice assistant built in Python with NLP & wav2vec2.
Language: Python - Size: 36.7 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

audeering/w2v2-how-to
How to use our public wav2vec2 dimensional emotion model
Language: Jupyter Notebook - Size: 98.6 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 398 - Forks: 47

ECNU-Cross-Innovation-Lab/ENT
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
Language: Python - Size: 638 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

mikezzb/lyrics-sync
A deep learning lyrics-to-audio alignment system, generating synchronized lyrics from a song and its lyrics
Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

Sreyan88/Indic-ASR
Repository for pre-trained wav2vec 2.0 models on 7 Indian languages
Language: Python - Size: 18.6 KB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 0

zhu00121/Universal-representation-dynamics-of-deepfake-speech
This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"
Language: Python - Size: 339 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

egorsmkv/asr-corpus-creator π¦
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
Language: Python - Size: 2.47 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 3

nisheethjaiswal/Speech-to-Text
Speech to text implementation using transformers in PyTorch.
Language: Jupyter Notebook - Size: 333 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

RubensZimbres/Repo-2022
Python codes on PyTorch, Tensorflow, Keras, Wav2Vec2 Fine-Tuning and Google Cloud
Language: Jupyter Notebook - Size: 74.7 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 4

ShafakatArnob/Automatic-Bengali-Subtitle-Generation-Deep-Learning
Automatic Subtitle Generation for Bengali Multimedia Using Deep Learning.
Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RajGothi/Improving-Automatic-Speech-Recognition-with-Dialect-Specific-Language-Models
This repository contains the implementation of our published paper titled 'Improving Automatic Speech Recognition with Dialect-Specific Language Models,' presented at SPECOM'23.
Language: Jupyter Notebook - Size: 1000 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ECNU-Cross-Innovation-Lab/ShiftSER
[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
Language: Python - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 2

Dhruv16S/Transcribing-Video-to-Text
This repository is an implementation of the Wav2Vec2 model for converting speech into text through a series of speech recognition, noise removal and STT to transcribe the text from a video file.
Language: Python - Size: 13.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

agustyawan-arif/wav2vec2-large-xlsr-53-id
Performing audio transcription using the Wav2Vec2 model trained on the Common Voice dataset 13 for Indonesian.
Language: Python - Size: 5.18 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

kingabzpro/hindiSpeechPro-Automatic-Speech-Recognization Fork of SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization
The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.
Language: Jupyter Notebook - Size: 2.56 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization
The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.
Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

appledora/wav2vec2_scripts
A modular codebase to process audio dataset, generate custom tokenizer, finetune and infer wav2vec2 model on custom dataset.
Language: Python - Size: 35.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

oswaldoludwig/Pruning-pre-trained-models-using-evolutionary-computation
This repository contains scripts to prune Wav2vec2 using a neuroevolution-based method. More details about this method can be found in the paper Compressing Wav2vec2 for Embedded Applications.
Language: Shell - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hammaad2002/ASRAdversarialAttacks
An ASR (Automatic Speech Recognition) adversarial attack repository.
Language: Jupyter Notebook - Size: 10 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

JuJu2181/Automatic-Nepali-Speech-Recognition-and-Summarizer
A system capable of converting Nepali speech to text and generate summary of text
Language: Jupyter Notebook - Size: 288 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 2

fatou1526/ASR_wav2vec2
This repo contains codes about loading audio data, training wav2vec2 model with custom language dataset
Language: Jupyter Notebook - Size: 664 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tuanio/noisy-student-training-asr
Pytorch implementation of Noisy Student Training for Automatic Speech Recognition and Automatic Pronunciation Error Detection problem
Language: Python - Size: 3.07 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 7

baocin/hugging_face_example_STT_api
Demonstration of Hugging Face's (https://huggingface.co/) newly released Wav2Vec2 model for easy, reasonably coherent, Speech to Text!
Language: Python - Size: 41.8 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

trinhtuanvubk/Wav2Vec2-Triton-Serving
Serve Wav2Vec2 model using Triton Inference Server
Language: Python - Size: 623 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Telegram-Zalo/zac2022-lyric-alignment
Solution for Zalo AI Challenge 2022 - Lyrics Alignment
Language: Python - Size: 949 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 61 - Forks: 18

sotiriskar/audio-note
Python application for taking audio notes and create summary of meetings.
Language: Python - Size: 887 KB - Last synced at: 1 day ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

trinhtuanvubk/speech-to-text-demo
Speech to Text demo with Wav2Vec2 model
Language: Python - Size: 2.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jvel07/wav2vec2_patho
Fine-tuning wav2vec2 to for Pathological Speech Processing
Language: Jupyter Notebook - Size: 4.05 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

TerboucheHacene/speech-keyword-spotting
Speech Keyword detection using Wav2Vec Model
Language: Python - Size: 314 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

lstrgar/self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
Language: Python - Size: 106 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 4

viksit-siddhant/compare2023
SER and audio classification using both a Wav2Vec2 based model and an ASR->Bert pipeline, as well as utilizing a multimodal late-fusion model
Language: Python - Size: 12.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

techiaith/docker-huggingface-stt-cy
Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace
Language: Python - Size: 321 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 4

trinhtuanvubk/finetune-wav2vec2
Language: Python - Size: 5.15 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

HarunoriKawano/Wav2vec2.0
Implementation of the paper "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations" in Pytorch.
Language: Python - Size: 203 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

nomnomnonono/SoundEffect-Search
Application to search for similar sound effects by voice and title.
Language: Python - Size: 31.3 KB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

keshavbhandari/Audioneme
AI model for speech disorder detection
Language: Python - Size: 56.1 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

dsalnikov/wav2vec
pure numpy implementation of wav2vec 2.0
Language: Python - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

SanchezCris/SDR-Automatic-Speech-Recognition
FM signal capturing system and voice recognition for the assistance of individuals with hearing impairments.
Language: Python - Size: 48 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

Hamtech-ai/wav2vec2-fa
fine-tune Wav2vec2. an ASR model released by Facebook
Language: Jupyter Notebook - Size: 549 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 32 - Forks: 3

thisisHJLee/Fine-Tuning-of-XLSR-Wav2Vec2-on-Korean
Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0
