An open API service providing repository metadata for many open source software ecosystems.

Topic: "wav2vec2"

PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language: Python - Size: 69.4 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 11,799 - Forks: 1,903

s3prl/s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Language: Python - Size: 135 MB - Last synced at: about 5 hours ago - Pushed at: about 1 month ago - Stars: 2,377 - Forks: 499

audeering/w2v2-how-to

How to use our public wav2vec2 dimensional emotion model

Language: Jupyter Notebook - Size: 98.6 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 398 - Forks: 47

oliverguhr/wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Language: Python - Size: 2.84 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 348 - Forks: 56

pszemraj/vid2cleantxt

Python API & command-line tool to easily transcribe speech-based video files into clean text

Language: Jupyter Notebook - Size: 723 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 209 - Forks: 29

habla-liaa/ser-with-w2v2

Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'

Language: Jupyter Notebook - Size: 32.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 128 - Forks: 23

khanld/ASR-Wav2vec-Finetune

:zap: Finetune Wa2vec 2.0 For Speech Recognition

Language: Python - Size: 5.1 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 127 - Forks: 28

vietai/ASR

End-to-End Vietnamese Speech Recognition using wav2vec 2.0

Size: 10.7 KB - Last synced at: 20 days ago - Pushed at: over 3 years ago - Stars: 98 - Forks: 9

thevasudevgupta/gsoc-wav2vec2

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Language: Jupyter Notebook - Size: 6.67 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 78 - Forks: 29

Telegram-Zalo/zac2022-lyric-alignment

Solution for Zalo AI Challenge 2022 - Lyrics Alignment

Language: Python - Size: 949 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 61 - Forks: 18

inboxpraveen/LLM-Minutes-of-Meeting

🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀

Language: Python - Size: 7.14 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 57 - Forks: 7

vectominist/MiniASR

A mini, simple, and fast end-to-end automatic speech recognition toolkit.

Language: Jupyter Notebook - Size: 342 KB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 50 - Forks: 6

tuanio/noisy-student-training-asr

Pytorch implementation of Noisy Student Training for Automatic Speech Recognition and Automatic Pronunciation Error Detection problem

Language: Python - Size: 3.07 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 7

pooya-mohammadi/audio-classification-pytorch

In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.

Language: Jupyter Notebook - Size: 871 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 41 - Forks: 4

khanld/Wav2vec2-Pretraining

Wav2vec 2.0 Self-Supervised Pretraining

Language: Python - Size: 303 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 4

ttop32/wav2vec2-live-japanese-translator

real time japanese speech recognition translator using wav2vec2

Language: Jupyter Notebook - Size: 926 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 3

lstrgar/self-supervised-phone-segmentation

Phoneme segmentation using pre-trained speech models

Language: Python - Size: 106 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 4

lucasgris/wav2vec4bp

Wav2vec resources and models for Brazilian Portuguese

Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: 23 days ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 2

Hamtech-ai/wav2vec2-fa

fine-tune Wav2vec2. an ASR model released by Facebook

Language: Jupyter Notebook - Size: 549 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 32 - Forks: 3

egorsmkv/asr-corpus-creator 📦

This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.

Language: Python - Size: 2.47 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 3

mmakiuchi/multimodal_emotion_recognition

Scripts used in the research described in the paper "Multimodal Emotion Recognition with High-level Speech and Text Features" accepted in the ASRU 2021 conference.

Language: Python - Size: 77.1 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 25 - Forks: 6

mt-upc/SHAS

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Language: Python - Size: 368 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 24 - Forks: 2

daanzu/wav2vec2_stt_python

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Language: Python - Size: 88.9 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 3

AmirAbaskohi/Automatic-Speech-recognition-for-Speech-Assessment-of-Persian-Preschool-Children

Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.

Language: Jupyter Notebook - Size: 1.23 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 1

ECNU-Cross-Innovation-Lab/ShiftSER

[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations

Language: Python - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 2

kingabzpro/WOLOF-ASR-Wav2Vec2

Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.

Language: Jupyter Notebook - Size: 3.34 MB - Last synced at: about 11 hours ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 8

skit-ai/Map-Mix

The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023)

Size: 18.1 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 1

FernandoLpz/SpeechRecognition

This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.

Language: Python - Size: 118 KB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

techiaith/docker-huggingface-stt-cy

Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace

Language: Python - Size: 321 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 4

notAI-tech/IndicASR

Speeech Recognition for Indic languages.

Language: Python - Size: 623 KB - Last synced at: 19 days ago - Pushed at: about 4 years ago - Stars: 13 - Forks: 3

yamahigashi/Wav2Vec2FBX

Recognize speech from an audio file and convert it into animation FBX

Language: Python - Size: 199 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 12 - Forks: 3

aryanxxvii/lark

Speech Assessment API in FastAPI with HuggingFace 🤗

Language: JavaScript - Size: 183 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

jmaczan/asr-dysarthria

Research on Automatic Speech Recognition for dysarthric speech

Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 11 - Forks: 2

wngh1187/IPET

Pytorch implementation of INTEGRATED PARAMETER-EFFICIENT TUNING FOR GENERAL-PURPOSE AUDIO MODELS

Language: Python - Size: 4.28 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

parvatijay2901/Hindi-ASR-and-TTS

EC499: Major Project

Language: Shell - Size: 68.4 KB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 2

Sreyan88/Toxicity-Detection-in-Spoken-Utterances

This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"

Language: Jupyter Notebook - Size: 976 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 5

mikezzb/lyrics-sync

A deep learning lyrics-to-audio alignment system, generating synchronized lyrics from a song and its lyrics

Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

kardSIM/audio2img

Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model

Language: Jupyter Notebook - Size: 29.8 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 7 - Forks: 1

ECNU-Cross-Innovation-Lab/ENT

[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Language: Python - Size: 638 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

louisbrulenaudet/balena

BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.

Language: Python - Size: 229 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

Msparihar/Transcriber

Developed an AI tool to automatically generate captions and transcripts for YouTube videos in 67 languages and can generate summarized texts in 133 languages.

Language: Python - Size: 14.6 KB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

RubensZimbres/Repo-2022

Python codes on PyTorch, Tensorflow, Keras, Wav2Vec2 Fine-Tuning and Google Cloud

Language: Jupyter Notebook - Size: 74.7 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 4

scottykwok/cantonese-selfish-project

Cantonese Selfish Project 廣東話自肥企劃 at PYCON HK 2021

Language: Jupyter Notebook - Size: 8.44 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 1

hammaad2002/ASRAdversarialAttacks

An ASR (Automatic Speech Recognition) adversarial attack repository.

Language: Jupyter Notebook - Size: 10 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

JuJu2181/Automatic-Nepali-Speech-Recognition-and-Summarizer

A system capable of converting Nepali speech to text and generate summary of text

Language: Jupyter Notebook - Size: 288 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 2

pradeepbatchu/speechtotext

Speech to Text with Wav2Vec2 using torchaudio

Language: Python - Size: 533 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

seanghay/kfa

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus

Language: Python - Size: 10.1 MB - Last synced at: 17 days ago - Pushed at: 12 months ago - Stars: 5 - Forks: 0

jvel07/wav2vec2_patho

Fine-tuning wav2vec2 to for Pathological Speech Processing

Language: Jupyter Notebook - Size: 4.05 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

navalnica/wav2vec2-belarusian

Speech to Text model for Belarusian language

Language: Jupyter Notebook - Size: 1.37 MB - Last synced at: 18 days ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

somosnlp/wav2vec2-spanish

Pre-train a Spanish Wav2Vec2 model using the Spanish portion of the Common Voice dataset.

Language: Python - Size: 17.6 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

nhut-ngnn/Multimodal-Speech-Emotion-Recognition

A multimodal SER project combining BERT and ECAPA-TDNN with cross-attention-based fusion on the IEMOCAP dataset.

Language: Python - Size: 7.07 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

Sreyan88/Indic-ASR

Repository for pre-trained wav2vec 2.0 models on 7 Indian languages

Language: Python - Size: 18.6 KB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 0

slinusc/speaker_identification_evaluation

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks

Language: Jupyter Notebook - Size: 8.56 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

gulabpatel/Speech-to-Text

Language: Jupyter Notebook - Size: 1.64 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

Sarasadeghii/Sharif-Wav2vec2

This repo shows how to finetune the wav2vec2.0 model along with its prerequisites.

Language: Jupyter Notebook - Size: 297 KB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 0

dsalnikov/wav2vec

pure numpy implementation of wav2vec 2.0

Language: Python - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

TerboucheHacene/speech-keyword-spotting

Speech Keyword detection using Wav2Vec Model

Language: Python - Size: 314 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

ranchlai/wav2vec-2.0

Wav2vec2 English speech recognition in PaddlePaddle

Language: Python - Size: 316 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

zlab-foss/Wav2Vec2-XLSR-Finetune

ready to use notebook to finetune wav2vec2 on persian

Language: Jupyter Notebook - Size: 9.27 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 2

nisheethjaiswal/Speech-to-Text

Speech to text implementation using transformers in PyTorch.

Language: Jupyter Notebook - Size: 333 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

baocin/hugging_face_example_STT_api

Demonstration of Hugging Face's (https://huggingface.co/) newly released Wav2Vec2 model for easy, reasonably coherent, Speech to Text!

Language: Python - Size: 41.8 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

dangrebenkin/wav2vec2_speech_markuper

Automatic generation of speech dataset markup using Wav2Vec2 ASR models

Language: Python - Size: 396 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

trinhtuanvubk/finetune-wav2vec2

Language: Python - Size: 5.15 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization

The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.

Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

oswaldoludwig/Pruning-pre-trained-models-using-evolutionary-computation

This repository contains scripts to prune Wav2vec2 using a neuroevolution-based method. More details about this method can be found in the paper Compressing Wav2vec2 for Embedded Applications.

Language: Shell - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

imsanjoykb/Speech-NLP-Bootcamp

Speech NLP Bootcamp

Language: Jupyter Notebook - Size: 3.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

manthanthakker/AudioClassification

This repository contains code/papers/research on Speech or Audio Classification

Language: Jupyter Notebook - Size: 143 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

ZahraRahimii/AirTrafficControl-AutomaticSpeechRecognition-Project

ATC ASR; internship at Asr Gooyesh Company

Language: Jupyter Notebook - Size: 4.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

lectly/wav2vec2-large-xlsr-53-egyptian-arabic

Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers

Language: Jupyter Notebook - Size: 1.89 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

ahammedrohit/Speech-Recognition-using-wav2vec2-with-minimum-GPU

Python Colab for speech recognition with wav2vec2. Since wav2vec2 requires heavy GPU I've come up with a way to run this on Google Colab as well as local machines with minimum GPU.

Language: Jupyter Notebook - Size: 22.5 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

phanxuanphucnd/wav2asr

A library version of wav2vec 2.0 framework for Automatic Speech Recognition task.

Language: Python - Size: 8.5 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 4

kipmccharen/sys6016_DL_project

pretrained SpeechBrain wav2vec seq2seq+CTC model trained on TIMIT dataset. Created by Kip McCharen, Siddharth Surapaneni, and Pavan Bondalapati

Language: Python - Size: 1.21 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

egorsmkv/speech-to-text-using-php

Use PHP for Speech-to-Text task. Just a research.

Language: PHP - Size: 289 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

thiagogre/mimicking

English Pronunciation Improvement App

Language: Python - Size: 852 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

aitor-alvarez/acoustic-transformer-models

Acoustic Transformer Models for Audio Classification

Language: Python - Size: 51.8 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

aitor-alvarez/large-speech-models

Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper

Language: Python - Size: 84 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

piedeboer96/Digital-Assistant-Audio-Processing

Project 2.2 - Speech Recognition and Speaker Identification

Language: Java - Size: 13 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

JingleCate/SpeechEmotionRecog

A simple Speech Emotion Recognition (SER) project based on Wav2Vec2.

Language: Python - Size: 235 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

zhu00121/Universal-representation-dynamics-of-deepfake-speech

This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"

Language: Python - Size: 339 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

SanchezCris/SDR-Automatic-Speech-Recognition

FM signal capturing system and voice recognition for the assistance of individuals with hearing impairments.

Language: Python - Size: 48 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

sotiriskar/audio-note

Python application for taking audio notes and create summary of meetings.

Language: Python - Size: 887 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

kalindasiaminwe/ChitongaASR

A natural language processing and machine learning project for a low resource langauge in Zambia.

Language: Jupyter Notebook - Size: 548 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

thisisHJLee/Fine-Tuning-of-XLSR-Wav2Vec2-on-Korean

Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

keshavbhandari/Audioneme

AI model for speech disorder detection

Language: Python - Size: 56.1 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

mead-ml/audio8

Deep audio modeling

Language: Python - Size: 307 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

lakshiitakalyanasundaram/DeepSonic

DeepFake Audio detection project using Wav2Vec2 for MOMENTA (Task for internship )

Language: HTML - Size: 64.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

mahshid1378/ASR-Wav2vec-Finetune

⚡ Finetune Wa2vec 2.0 For Speech Recognition

Language: Python - Size: 5.01 MB - Last synced at: 15 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

jp1924/ASR

🤗ASR 학습시키기 위한 코드

Language: Python - Size: 413 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

moxeeem/ASR-pronunciation-correction

Этот проект представляет систему автоматической коррекции произношения на английском языке с использованием нейронной сети wav2vec2.

Language: Jupyter Notebook - Size: 9.17 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

moncefbenaicha/spoken-ner

Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning

Language: Python - Size: 113 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

egorsmkv/w2v2-bert-aligner

Aligner for wav2vec2-bert models

Language: Python - Size: 1.95 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Nightey3s/Speech-Emotion-Recognition-using-Wav2Vec2

A Speech Emotion Recognition (SER) system using Facebook's Wav2Vec2 model that classifies speech into four emotions (Neutral, Happy, Sad, Angry). Achieves 69.02% accuracy on IEMOCAP dataset using modern transformer architecture and comprehensive data augmentation techniques.

Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Not-ML/audio-ml

Standalone Audio ML Application: An innovative Python-based tool integrating Speech Recognition (ASR), Sentiment Analysis (NLP), and Text-to-Speech (TTS) to process audio, analyze sentiment, and generate spoken responses. Features both command-line and GUI interfaces for seamless interaction.

Language: Python - Size: 20.5 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

sugarcane-mk/finetuning_wav2vec2

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers

Language: Jupyter Notebook - Size: 42 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

egorsmkv/wav2vec2-hidet

A test to run w2v2 with hidet optimizer

Language: Python - Size: 402 KB - Last synced at: 27 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

akash13s/audio-to-image Fork of rishavroy97/audio-to-image

Pipeline for generating images conditioned on input audio

Language: Python - Size: 3.11 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

kamalesh003/NoiseCancellationTranscriptionModel

Noise Cancellation Transcription Model Using Wav2Vec2

Language: Jupyter Notebook - Size: 240 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

tracyreuter/NLP-speech-to-text

Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper

Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

sebinbenjamin/wav2vec_demo

A Python tool for transcribing speech from audio files using the Wav2Vec 2.0 model. Supports multilingual transcription, automatic audio chunking, and easy setup

Language: Python - Size: 4.88 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

moncefbenaicha/SpokenNER

Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning

Language: Python - Size: 627 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Related Topics
speech-recognition 48 speech-to-text 35 asr 34 pytorch 22 deep-learning 18 huggingface 17 transformers 17 python 16 whisper 13 nlp 13 automatic-speech-recognition 13 wav2vec 12 machine-learning 11 fine-tuning 9 audio-processing 9 audio 9 speech 9 hubert 8 speech-emotion-recognition 7 transformer 7 asr-model 6 self-supervised-learning 5 huggingface-transformers 5 tts 5 bert 4 speech-processing 4 text-to-speech 4 xlsr 4 natural-language-processing 4 stt 4 audio-classification 3 onnx 3 forced-alignment 3 fastapi 3 ai 3 dataset 3 gradio 3 wavlm 3 emotion-recognition 3 pytorch-lightning 3 transformer-models 3 deepspeech 3 spoken-language-understanding 3 transcription 3 contrastive-learning 2 llm 2 spoken-ner 2 fairseq 2 ctc 2 facebook 2 docker 2 classification 2 distilhubert 2 finetuning 2 vietnamese 2 wav2vec2-large-960h 2 multimodal 2 audio-analysis 2 kenlm 2 transfer-learning 2 conformer 2 wolof 2 hindi-language 2 torch 2 wav 2 common-voice-dataset 2 language-model 2 deep-neural-networks 2 indian-language 2 speech-representation 2 tensorflow 2 speech-translation 2 audio-segmentation 2 pyaudio 2 self-supervised 2 voice-recognition 2 speaker-recognition 2 translation 2 speech-recognition-model 1 apr 1 deepfake 1 ukrainian 1 speaker-identification 1 signal-processing 1 java 1 hugging-face 1 toxicity-classification 1 speech-classification 1 numpy 1 phoneme-recognition 1 educational 1 gtts 1 wer 1 farsi-datasets 1 valence 1 large-speech-models 1 finetuning-whisper 1 finetuning-wav2vec 1 deep-fake-audio 1 arabic-speech-recognition 1