An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: speech-language-model

ryota-komatsu/speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

Language: Python - Size: 1.36 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 38 - Forks: 9

ryota-komatsu/slp2025

音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料

Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 16 - Forks: 2

ryota-komatsu/speech_resynth

Speech Resynthesis and Language Modeling

Language: Python - Size: 4.86 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 17 - Forks: 4

Ereboas/MagiCodec

A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.

Language: Python - Size: 216 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 35 - Forks: 3

lucadellalib/audiocodecs

A collections of audio codecs with a standardized API

Language: Python - Size: 851 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 3

ictnlp/LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Language: Python - Size: 3.28 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,923 - Forks: 197

ictnlp/SLED-TTS

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

Language: Python - Size: 378 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 58 - Forks: 5

slp-rl/slamkit

SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"

Language: Python - Size: 1.18 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 210 - Forks: 9

zhenye234/xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Language: Python - Size: 1.77 MB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 209 - Forks: 13

slp-rl/salmon

The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)

Language: Python - Size: 215 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 45 - Forks: 0

OmniMMI/OpenOmniNexus

a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.

Language: Python - Size: 39.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

OmniMMI/OmniMMI

[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Language: Python - Size: 25.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

jishengpeng/WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Language: Python - Size: 390 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1,096 - Forks: 84

kehanlu/DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

Language: HTML - Size: 4.43 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 41 - Forks: 3

hhguo/SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

Language: Python - Size: 1.29 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 75 - Forks: 4

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

Size: 2.24 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 93 - Forks: 3