GitHub topics: speech-language-model
ryota-komatsu/speaker_disentangled_hubert
Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
Language: Python - Size: 1.36 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 38 - Forks: 9

ryota-komatsu/slp2025
音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料
Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 16 - Forks: 2

ryota-komatsu/speech_resynth
Speech Resynthesis and Language Modeling
Language: Python - Size: 4.86 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 17 - Forks: 4

Ereboas/MagiCodec
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
Language: Python - Size: 216 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 35 - Forks: 3

lucadellalib/audiocodecs
A collections of audio codecs with a standardized API
Language: Python - Size: 851 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 3

ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language: Python - Size: 3.28 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,923 - Forks: 197

ictnlp/SLED-TTS
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
Language: Python - Size: 378 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 58 - Forks: 5

slp-rl/slamkit
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
Language: Python - Size: 1.18 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 210 - Forks: 9

zhenye234/xcodec
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Language: Python - Size: 1.77 MB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 209 - Forks: 13

slp-rl/salmon
The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)
Language: Python - Size: 215 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 45 - Forks: 0

OmniMMI/OpenOmniNexus
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
Language: Python - Size: 39.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

OmniMMI/OmniMMI
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Language: Python - Size: 25.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

jishengpeng/WavTokenizer
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Language: Python - Size: 390 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1,096 - Forks: 84

kehanlu/DeSTA2
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
Language: HTML - Size: 4.43 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 41 - Forks: 3

hhguo/SoCodec
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
Language: Python - Size: 1.29 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 75 - Forks: 4

jishengpeng/WavChat
A Survey of Spoken Dialogue Models (60 pages)
Size: 2.24 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 93 - Forks: 3
