GitHub topics: speech-language-model

Repositories

ryota-komatsu/speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

Language: Python - Size: 1.36 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 38 - Forks: 9

ryota-komatsu/slp2025

音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料

Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 16 - Forks: 2

ryota-komatsu/speech_resynth

Speech Resynthesis and Language Modeling

Language: Python - Size: 4.86 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 17 - Forks: 4

Ereboas/MagiCodec

A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.

Language: Python - Size: 216 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 35 - Forks: 3

lucadellalib/audiocodecs

A collections of audio codecs with a standardized API

Language: Python - Size: 851 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 3

ictnlp/LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Language: Python - Size: 3.28 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,923 - Forks: 197

ictnlp/SLED-TTS

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

Language: Python - Size: 378 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 58 - Forks: 5

slp-rl/slamkit

SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"

Language: Python - Size: 1.18 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 210 - Forks: 9

zhenye234/xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Language: Python - Size: 1.77 MB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 209 - Forks: 13

slp-rl/salmon

The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)

Language: Python - Size: 215 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 45 - Forks: 0

OmniMMI/OpenOmniNexus

a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.

Language: Python - Size: 39.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

OmniMMI/OmniMMI

[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Language: Python - Size: 25.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

jishengpeng/WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Language: Python - Size: 390 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1,096 - Forks: 84

kehanlu/DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

Language: HTML - Size: 4.43 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 41 - Forks: 3

hhguo/SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

Language: Python - Size: 1.29 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 75 - Forks: 4

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

Size: 2.24 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 93 - Forks: 3

Related Keywords

speech-language-model 16 speech 6 text-to-speech 5 self-supervised-learning 4 speech-processing 4 speech-representation 4 codec 4 multimodal-large-language-models 3 speech-synthesis 3 speech-interaction 3 large-language-models 3 encodec 3 multi-modal 2 semantic 2 audio 2 language-model 2 dac 2 tts 2 pytorch 2 llms 2 mimi 1 audio-processing 1 spoken-language-processing 1 omni-language-model 1 acoustic 1 audio-representation 1 gpt4o 1 music-representation-learning 1 soundstream 1 speech-codec 1 duplex 1 gpt-4o 1 intreaction 1 llama-omni 1 mini-omni 1 modal-alignment 1 moshi 1 spoken-dialogue-models 1 streaming 1 wavtokenizer 1 quantization 1 speech-coding 1 speechtokenizer 1 wavlm 1 speech-to-speech 1 speech-to-text 1 streaming-inference 1 efficient-training 1 transformers 1 audio-codec 1 gpt 1 music 1 llm 1 sound 1 text-to-music 1 text-to-sound 1 tokenizer 1 vall-e 1 acoustic-model 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos