An open API service providing repository metadata for many open source software ecosystems.

Topic: "mixture-of-experts"

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language: Python - Size: 217 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 38,300 - Forks: 4,360

dvmazur/mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Language: Python - Size: 261 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 2,311 - Forks: 232

codelion/optillm

Optimizing inference proxy for LLMs

Language: Python - Size: 1.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,226 - Forks: 174

learning-at-home/hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Language: Python - Size: 12.1 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 2,176 - Forks: 186

PKU-YuanGroup/MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language: Python - Size: 16.5 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 2,158 - Forks: 133

rhymes-ai/Aria

Codebase for Aria - an Open Multimodal Native MoE

Language: Jupyter Notebook - Size: 120 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 995 - Forks: 83

pjlab-sys4nlp/llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Language: Python - Size: 1.69 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 961 - Forks: 56

microsoft/Tutel

Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4

Language: C - Size: 1.11 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 820 - Forks: 97

davidmrau/mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Language: Python - Size: 73.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 818 - Forks: 88

SMTorg/smt

Surrogate Modeling Toolbox

Language: Jupyter Notebook - Size: 163 MB - Last synced at: 16 days ago - Pushed at: 19 days ago - Stars: 755 - Forks: 215

lucidrains/mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Language: Python - Size: 136 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 744 - Forks: 59

AviSoori1x/makeMoE

From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

Language: Jupyter Notebook - Size: 6.96 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 686 - Forks: 73

drawbridge/keras-mmoe

A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

Language: Python - Size: 9.11 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 670 - Forks: 219

ymcui/Chinese-Mixtral

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

Language: Python - Size: 519 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 603 - Forks: 44

lucidrains/st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

Language: Python - Size: 178 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 326 - Forks: 28

lucidrains/soft-moe-pytorch

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

Language: Python - Size: 1.38 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 290 - Forks: 8

Luodian/Generalizable-Mixture-of-Experts

GMoE could be the next backbone model for many kinds of generalization task.

Language: Python - Size: 2.04 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 269 - Forks: 35

inferflow/inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

Language: C++ - Size: 1.89 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 243 - Forks: 25

SkyworkAI/MoH

MoH: Multi-Head Attention as Mixture-of-Head Attention

Language: Python - Size: 5.26 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 233 - Forks: 9

efeslab/fiddler

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Language: Python - Size: 1.72 MB - Last synced at: about 3 hours ago - Pushed at: 6 months ago - Stars: 210 - Forks: 20

EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Language: Python - Size: 457 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 179 - Forks: 13

koayon/awesome-adaptive-computation

A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).

Size: 331 KB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 143 - Forks: 9

eduardzamfir/seemoredetails

[ICML 2024] See More Details: Efficient Image Super-Resolution by Experts Mining

Language: Python - Size: 10.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 138 - Forks: 2

lucidrains/PEER-pytorch

Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind

Language: Python - Size: 271 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 123 - Forks: 3

shufangxun/LLaVA-MoD

[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Language: Python - Size: 3.41 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 120 - Forks: 7

lucidrains/mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts

Language: Python - Size: 34.1 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 118 - Forks: 4

Adlith/MoE-Jetpack

[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Language: Python - Size: 32.3 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 115 - Forks: 1

YangLing0818/RealCompo

[NeurIPS 2024] RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Language: Python - Size: 7.45 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 115 - Forks: 4

arpita8/Awesome-Mixture-of-Experts-Papers

Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.

Size: 2.21 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 115 - Forks: 3

relf/egobox

Efficient global optimization toolbox in Rust: bayesian optimization, mixture of gaussian processes, sampling methods

Language: Rust - Size: 11.9 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 112 - Forks: 6

liuqidong07/MOELoRA-peft

[SIGIR'24] The official implementation code of MOELoRA.

Language: Python - Size: 10.2 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 105 - Forks: 11

kyegomez/SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

Language: Python - Size: 2.42 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 100 - Forks: 12

LINs-lab/DynMoE

[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Language: Python - Size: 57.3 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 89 - Forks: 11

xrsrke/pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

Language: Python - Size: 1.26 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 82 - Forks: 18

OpenSparseLLMs/LLaMA-MoE-v2

🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Language: Python - Size: 2.21 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 78 - Forks: 11

Leeroo-AI/mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Language: Python - Size: 1.61 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 76 - Forks: 3

HLTCHKUST/MoEL

MoEL: Mixture of Empathetic Listeners

Language: Python - Size: 8.52 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 71 - Forks: 14

fkodom/soft-mixture-of-experts

PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)

Language: Python - Size: 152 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 71 - Forks: 5

CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".

Language: Python - Size: 47.1 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 67 - Forks: 5

dmis-lab/Monet

[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers

Language: Python - Size: 252 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 66 - Forks: 3

UNITES-Lab/MC-SMoE

[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

Language: Python - Size: 1.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 56 - Forks: 7

bwconrad/soft-moe

PyTorch implementation of "From Sparse to Soft Mixtures of Experts"

Language: Python - Size: 344 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 56 - Forks: 3

mryab/learning-at-home

"Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts" (NeurIPS 2020), original PyTorch implementation

Language: Jupyter Notebook - Size: 272 KB - Last synced at: 17 days ago - Pushed at: over 4 years ago - Stars: 54 - Forks: 1

AmazaspShumik/mtlearn

Multi-Task Learning package built with tensorflow 2 (Multi-Gate Mixture of Experts, Cross-Stitch, Ucertainty Weighting)

Language: Python - Size: 10.1 MB - Last synced at: 13 days ago - Pushed at: over 5 years ago - Stars: 52 - Forks: 6

Leeroo-AI/leeroo_orchestrator

The implementation of "Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration"

Language: Python - Size: 857 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 46 - Forks: 4

AmazaspShumik/Mixture-Models

Hierarchical Mixture of Experts,Mixture Density Neural Network

Language: Jupyter Notebook - Size: 4.57 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 45 - Forks: 17

LoserCheems/WonderfulMatrices

Wonderful Matrices to Build Small Language Models

Language: Python - Size: 8.78 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 43 - Forks: 0

924973292/DeMo

【AAAI2025】DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Language: Python - Size: 17 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 41 - Forks: 2

VITA-Group/Neural-Implicit-Dict

[ICML 2022] "Neural Implicit Dictionary via Mixture-of-Expert Training" by Peihao Wang, Zhiwen Fan, Tianlong Chen, Zhangyang Wang

Language: Python - Size: 958 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 1

Spico197/MoE-SFT

🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Language: Python - Size: 552 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 38 - Forks: 0

AIDC-AI/Parrot

🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.

Language: Python - Size: 25.2 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 36 - Forks: 1

lucidrains/sinkhorn-router-pytorch

Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise

Language: Python - Size: 27.3 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 34 - Forks: 0

eduardzamfir/MoCE-IR

[CVPR 2025] Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Language: Python - Size: 821 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 33 - Forks: 0

umbertocappellazzo/PETL_AST

This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters".

Language: Python - Size: 3.09 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 32 - Forks: 1

BorealisAI/MMoEEx-MTL

PyTorch Implementation of the Multi-gate Mixture-of-Experts with Exclusivity (MMoEEx)

Language: Python - Size: 31.4 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 32 - Forks: 4

eduardzamfir/DaAIR

GitHub repository for our project "Efficient Degradation-aware Any Image Restoration"

Size: 15.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 30 - Forks: 0

james-oldfield/muMoE

[NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Language: Python - Size: 2.95 MB - Last synced at: 28 days ago - Pushed at: 8 months ago - Stars: 30 - Forks: 1

RoyalSkye/Routing-MVMoE

[ICML 2024] "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts"

Language: Python - Size: 379 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 30 - Forks: 3

OpenSparseLLMs/CLIP-MoE

CLIP-MoE: Mixture of Experts for CLIP

Language: Python - Size: 2.35 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 29 - Forks: 0

kyegomez/LIMoE

Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts"

Language: Python - Size: 2.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 28 - Forks: 2

zjukg/MoMoK

[Paper][ICLR 2025] Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Language: Python - Size: 7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 27 - Forks: 3

SuperBruceJia/Awesome-Mixture-of-Experts

Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of Multimodal Experts (MoME)

Size: 438 KB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 27 - Forks: 3

sammcj/moa Fork of togethercomputer/MoA

Mixture-of-Ollamas

Language: Python - Size: 1.72 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 25 - Forks: 1

jaisidhsingh/pytorch-mixtures

One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.

Language: Python - Size: 366 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 22 - Forks: 1

Wuyxin/GraphMETRO

GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts (NeurIPS 2024)

Language: Python - Size: 36.1 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 21 - Forks: 1

dsy109/mixtools

Tools for Analyzing Finite Mixture Models

Language: R - Size: 499 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 20 - Forks: 4

tsc2017/MIX-GAN

Some recent state-of-the-art generative models in ONE notebook: (MIX-)?(GAN|WGAN|BigGAN|MHingeGAN|AMGAN|StyleGAN|StyleGAN2)(\+ADA|\+CR|\+EMA|\+GP|\+R1|\+SA|\+SN)*

Language: Jupyter Notebook - Size: 771 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 1

danelpeng/Awesome-Continual-Leaning-with-PTMs

This is a curated list of "Continual Learning with Pretrained Models" research.

Size: 254 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 0

checkstep/mole-stance

MoLE: Cross-Domain Label-Adaptive Stance Detection

Language: Python - Size: 47.9 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 5

AdamG012/moe-paper-models

A sumary of MoE experimental setups across a number of different papers.

Size: 10.7 KB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 1

dominiquegarmier/grok-pytorch

pytorch implementation of grok

Language: Python - Size: 44.9 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 13 - Forks: 0

gaozhitong/MoSE-AUSeg

The official code repo for the paper "Mixture of Stochastic Experts for Modeling Aleatoric Uncertainty in Segmentation". (ICLR 2023)

Language: Python - Size: 939 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 1

hanyas/mimo

A toolbox for inference of mixture models

Language: Python - Size: 713 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 4

vivamoto/classifier

Machine learning code, derivatives calculation and optimization algorithms developed during the Machine Learning course at Universidade de Sao Paulo. All codes in Python, NumPy and Matplotlib with example in the end of file.

Language: Python - Size: 13.1 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 12 - Forks: 4

cmavro/PackLLM

Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization

Language: Python - Size: 169 KB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 1

UNITES-Lab/HEXA-MoE

Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"

Language: Python - Size: 19.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 9 - Forks: 1

he-h/ST-MoE-BERT

This repository contains the code for the paper "ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction".

Language: Python - Size: 872 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 9 - Forks: 3

yuzhimanhua/SciMult

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding (Findings of EMNLP'23)

Language: Python - Size: 173 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 9 - Forks: 0

ilyalasy/moe-routing

Analysis of token routing for different implementations of Mixture of Experts

Language: Jupyter Notebook - Size: 882 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 0

clint-kristopher-morris/llm-guided-evolution

LLM Guided Evolution - The Automation of Models Advancing Models

Language: Python - Size: 345 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 8 - Forks: 5

yanring/Megatron-MoE-ModelZoo

Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.

Language: Python - Size: 26.4 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 8 - Forks: 1

EfficientMoE/MoE-Gen

High-throughput offline inference for MoE models with limited GPUs

Language: Python - Size: 552 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

louisbrulenaudet/mergeKit

Tools for merging pretrained Large Language Models and create Mixture of Experts (MoE) from open-source models.

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

RoyZry98/T-REX-Pytorch

[Arxiv 2024] Official code for T-REX: Mixture-of-Rank-One-Experts with semantic-aware Intuition for Multi-task Large Language Model Finetuning

Language: Python - Size: 19.2 MB - Last synced at: about 21 hours ago - Pushed at: about 22 hours ago - Stars: 7 - Forks: 0

UNITES-Lab/glider

Official code for the paper "Glider: Global and Local Instruction-Driven Expert Router"

Language: Python - Size: 477 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 7 - Forks: 0

Keefe-Murphy/MoEClust

Gaussian Parsimonious Clustering Models with Gating and Expert Network Covariates

Language: R - Size: 1.67 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 7 - Forks: 0

skiptoniam/ecomix

ecomix is a package to implement model based species level (Species Archetype Models) or site level (Regions of Common Profile) grouping of community data.

Language: R - Size: 63 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 7 - Forks: 2

AhmedMagdyHendawy/MOORE

Official code of the paper "Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts" at ICLR2024

Language: Python - Size: 813 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 3

jyjohnchoi/SMoP

The repository contains the code for our EMNLP 2023 paper "SMoP: Towards Efficient and Effective Prompt Tuning with Sparse Mixture-of-Prompts", written by Joon-Young Choi, Junho Kim, Jun-Hyung Park, Mok-Wing Lam, and SangKeun Lee.

Language: Python - Size: 154 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 2

BearCleverProud/MoME

Repository for Mixture of Multimodal Experts

Language: Python - Size: 857 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 0

dannyxiaocn/awesome-moe

a repo for moe papers and systems aggregation

Size: 474 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 2

ozyurtf/mixture-of-experts

Training two separate expert neural networks and one gater that can switch the expert networks.

Language: Python - Size: 10.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

antonio-f/mixture-of-experts-from-scratch

Mixture of Experts from scratch

Language: Jupyter Notebook - Size: 234 KB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

clementetienam/Ultra-Fast-Mixture-of-Experts-Regression

Language: MATLAB - Size: 1.75 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 0

alexliap/greek_gpt

MoE Decoder Transformer implementation with MLX

Language: Python - Size: 107 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 1

ZhenbangDu/DSD

[IEEE TAI] Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach

Language: Python - Size: 643 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 5 - Forks: 0

nusnlp/moece

The official code of the "Efficient and Interpretable Grammatical Error Correction with Mixture of Experts" paper

Language: Python - Size: 4.83 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 5 - Forks: 0

Keefe-Murphy/MEDseq

Mixtures of Exponential-Distance Models for Clustering Longitudinal Life-Course Sequences with Gating Covariates and Sampling Weights

Language: R - Size: 10.1 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 5 - Forks: 0

ZhenbangDu/Seizure_MoE

The official code for the paper 'Mixture of Experts for EEG-Based Seizure Subtype Classification'.

Language: Python - Size: 150 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

yamsgithub/modular_deep_learning

This repository contains scripts for implementing various learning from expert architectures, such as mixture of experts and product of experts, and performing various experiments with these architectures.

Language: Jupyter Notebook - Size: 332 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 1

Related Topics
deep-learning 30 moe 24 llm 22 machine-learning 22 large-language-models 21 pytorch 21 artificial-intelligence 14 nlp 13 transformer 11 transformers 9 computer-vision 9 ai 9 language-model 6 neural-networks 6 gaussian-processes 6 multi-task-learning 5 llama 5 llm-inference 5 efficiency 5 deep-neural-networks 5 ml 4 transfer-learning 4 llms 4 multimodal-large-language-models 4 attention 4 generative-ai 4 ensemble 4 python 4 natural-language-processing 4 vision-transformer 4 multi-modal 3 pytorch-implementation 3 lora 3 gpt 3 prompt-tuning 3 keras 3 conditional-computation 3 unsupervised-learning 3 ensemble-learning 3 inference 3 graph-neural-networks 3 huggingface 3 instruction-tuning 3 mixture-models 3 low-level-vision 3 neural-network 3 mixtral-8x7b 3 deepseek 3 foundation-models 3 llms-reasoning 3 vision-language-models 3 tensorflow 3 llms-benchmarking 2 peft 2 distributed-systems 2 generative-model 2 clustering 2 low-rank-adaptation 2 gpt4 2 deep-reinforcement-learning 2 multitask-learning 2 peft-fine-tuning-llm 2 alignment-strategies 2 routing 2 kdd2018 2 surrogate-models 2 chest-xrays 2 ct-scans 2 feature-pyramid-network 2 medical-image-captioning 2 medical-imaging 2 radiology-report-generation 2 llama3 2 quantization 2 cnn 2 model-based-clustering 2 machine-learning-algorithms 2 moa 2 mistral-7b 2 agents 2 domain-adaptation 2 genai 2 mergekit 2 regression-algorithms 2 mixture-of-models 2 fine-tuning 2 agent 2 mixtral 2 mixture-of-adapters 2 model 2 parameter-efficient-fine-tuning 2 adaptive-computation 2 mixture-model 2 r-package 2 all-in-one-restoration 2 data-parallelism 2 gpu 2 small-language-models 2 anomaly-detection 2 qwen 2