mixture-of-experts | Topic | Ecosyste.ms: Repos

Topic: "mixture-of-experts"

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language: Python - Size: 217 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 38,300 - Forks: 4,360

dvmazur/mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Language: Python - Size: 261 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 2,311 - Forks: 232

codelion/optillm

Optimizing inference proxy for LLMs

Language: Python - Size: 1.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,226 - Forks: 174

learning-at-home/hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Language: Python - Size: 12.1 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 2,176 - Forks: 186

PKU-YuanGroup/MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language: Python - Size: 16.5 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 2,158 - Forks: 133

rhymes-ai/Aria

Codebase for Aria - an Open Multimodal Native MoE

Language: Jupyter Notebook - Size: 120 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 995 - Forks: 83

pjlab-sys4nlp/llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Language: Python - Size: 1.69 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 961 - Forks: 56

microsoft/Tutel

Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4

Language: C - Size: 1.11 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 820 - Forks: 97

davidmrau/mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Language: Python - Size: 73.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 818 - Forks: 88

SMTorg/smt

Surrogate Modeling Toolbox

Language: Jupyter Notebook - Size: 163 MB - Last synced at: 16 days ago - Pushed at: 19 days ago - Stars: 755 - Forks: 215

lucidrains/mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Language: Python - Size: 136 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 744 - Forks: 59

AviSoori1x/makeMoE

From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

Language: Jupyter Notebook - Size: 6.96 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 686 - Forks: 73

drawbridge/keras-mmoe

A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

Language: Python - Size: 9.11 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 670 - Forks: 219

ymcui/Chinese-Mixtral

中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）

Language: Python - Size: 519 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 603 - Forks: 44

lucidrains/st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

Language: Python - Size: 178 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 326 - Forks: 28

lucidrains/soft-moe-pytorch

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

Language: Python - Size: 1.38 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 290 - Forks: 8

Luodian/Generalizable-Mixture-of-Experts

GMoE could be the next backbone model for many kinds of generalization task.

Language: Python - Size: 2.04 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 269 - Forks: 35

inferflow/inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

Language: C++ - Size: 1.89 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 243 - Forks: 25

SkyworkAI/MoH

MoH: Multi-Head Attention as Mixture-of-Head Attention

Language: Python - Size: 5.26 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 233 - Forks: 9

efeslab/fiddler

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Language: Python - Size: 1.72 MB - Last synced at: about 3 hours ago - Pushed at: 6 months ago - Stars: 210 - Forks: 20

EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Language: Python - Size: 457 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 179 - Forks: 13

koayon/awesome-adaptive-computation

A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).

Size: 331 KB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 143 - Forks: 9

eduardzamfir/seemoredetails

[ICML 2024] See More Details: Efficient Image Super-Resolution by Experts Mining

Language: Python - Size: 10.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 138 - Forks: 2

lucidrains/PEER-pytorch

Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind

Language: Python - Size: 271 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 123 - Forks: 3

shufangxun/LLaVA-MoD

[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Language: Python - Size: 3.41 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 120 - Forks: 7

lucidrains/mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts

Language: Python - Size: 34.1 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 118 - Forks: 4

Adlith/MoE-Jetpack

[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Language: Python - Size: 32.3 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 115 - Forks: 1

YangLing0818/RealCompo

[NeurIPS 2024] RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Language: Python - Size: 7.45 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 115 - Forks: 4

arpita8/Awesome-Mixture-of-Experts-Papers

Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.

Size: 2.21 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 115 - Forks: 3

relf/egobox

Efficient global optimization toolbox in Rust: bayesian optimization, mixture of gaussian processes, sampling methods

Language: Rust - Size: 11.9 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 112 - Forks: 6

liuqidong07/MOELoRA-peft

[SIGIR'24] The official implementation code of MOELoRA.

Language: Python - Size: 10.2 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 105 - Forks: 11

kyegomez/SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

Language: Python - Size: 2.42 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 100 - Forks: 12

LINs-lab/DynMoE

[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Language: Python - Size: 57.3 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 89 - Forks: 11

xrsrke/pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

Language: Python - Size: 1.26 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18

OpenSparseLLMs/LLaMA-MoE-v2

🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Language: Python - Size: 2.21 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 78 - Forks: 11

Leeroo-AI/mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Language: Python - Size: 1.61 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 76 - Forks: 3

HLTCHKUST/MoEL

MoEL: Mixture of Empathetic Listeners

Language: Python - Size: 8.52 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 71 - Forks: 14

fkodom/soft-mixture-of-experts

PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)

Language: Python - Size: 152 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 71 - Forks: 5

CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".

Language: Python - Size: 47.1 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 67 - Forks: 5

dmis-lab/Monet

[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers

Language: Python - Size: 252 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 66 - Forks: 3

UNITES-Lab/MC-SMoE

[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

Language: Python - Size: 1.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 56 - Forks: 7

bwconrad/soft-moe

PyTorch implementation of "From Sparse to Soft Mixtures of Experts"

Language: Python - Size: 344 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 56 - Forks: 3

mryab/learning-at-home

"Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts" (NeurIPS 2020), original PyTorch implementation

Language: Jupyter Notebook - Size: 272 KB - Last synced at: 17 days ago - Pushed at: over 4 years ago - Stars: 54 - Forks: 1

AmazaspShumik/mtlearn

Multi-Task Learning package built with tensorflow 2 (Multi-Gate Mixture of Experts, Cross-Stitch, Ucertainty Weighting)

Language: Python - Size: 10.1 MB - Last synced at: 13 days ago - Pushed at: over 5 years ago - Stars: 52 - Forks: 6

Leeroo-AI/leeroo_orchestrator

The implementation of "Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration"

Language: Python - Size: 857 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 46 - Forks: 4

AmazaspShumik/Mixture-Models

Hierarchical Mixture of Experts,Mixture Density Neural Network

Language: Jupyter Notebook - Size: 4.57 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 45 - Forks: 17

LoserCheems/WonderfulMatrices

Wonderful Matrices to Build Small Language Models

Language: Python - Size: 8.78 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 43 - Forks: 0

924973292/DeMo

【AAAI2025】DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Language: Python - Size: 17 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 41 - Forks: 2

VITA-Group/Neural-Implicit-Dict

[ICML 2022] "Neural Implicit Dictionary via Mixture-of-Expert Training" by Peihao Wang, Zhiwen Fan, Tianlong Chen, Zhangyang Wang

Language: Python - Size: 958 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 1

Spico197/MoE-SFT

🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Language: Python - Size: 552 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 38 - Forks: 0

AIDC-AI/Parrot

🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.

Language: Python - Size: 25.2 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 36 - Forks: 1

lucidrains/sinkhorn-router-pytorch

Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise

Language: Python - Size: 27.3 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 34 - Forks: 0

eduardzamfir/MoCE-IR

[CVPR 2025] Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Language: Python - Size: 821 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 33 - Forks: 0

This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters".

Language: Python - Size: 3.09 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 32 - Forks: 1

BorealisAI/MMoEEx-MTL

PyTorch Implementation of the Multi-gate Mixture-of-Experts with Exclusivity (MMoEEx)

Language: Python - Size: 31.4 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 32 - Forks: 4

eduardzamfir/DaAIR

GitHub repository for our project "Efficient Degradation-aware Any Image Restoration"

Size: 15.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 30 - Forks: 0

james-oldfield/muMoE

[NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Language: Python - Size: 2.95 MB - Last synced at: 28 days ago - Pushed at: 8 months ago - Stars: 30 - Forks: 1

RoyalSkye/Routing-MVMoE

[ICML 2024] "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts"

Language: Python - Size: 379 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 30 - Forks: 3

OpenSparseLLMs/CLIP-MoE

CLIP-MoE: Mixture of Experts for CLIP

Language: Python - Size: 2.35 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 29 - Forks: 0

kyegomez/LIMoE

Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts"

Language: Python - Size: 2.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 28 - Forks: 2