GitHub topics: multi-modal
valhalla/valhalla
Open Source Routing Engine for OpenStreetMap
Language: C++ - Size: 119 MB - Last synced at: about 10 hours ago - Pushed at: about 12 hours ago - Stars: 5,042 - Forks: 762

presidio-oss/cline-based-code-generator
VS Code extension that streamlines development workflows through AI-powered task execution, intelligent file management, and automated code generation. Built on Cline, it integrates with various LLMs to enhance productivity and code quality while simplifying complex development tasks.
Language: TypeScript - Size: 110 MB - Last synced at: about 12 hours ago - Pushed at: about 14 hours ago - Stars: 62 - Forks: 50

jeremy-london/SnowRivals
SnowRivals: AI-Powered Snowboarding Coach
Language: TypeScript - Size: 4.05 MB - Last synced at: about 14 hours ago - Pushed at: about 16 hours ago - Stars: 0 - Forks: 1

SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Language: C# - Size: 393 MB - Last synced at: about 11 hours ago - Pushed at: 6 days ago - Stars: 3,347 - Forks: 465

microsoft/farmvibes-ai
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
Language: Jupyter Notebook - Size: 40 MB - Last synced at: about 9 hours ago - Pushed at: about 1 month ago - Stars: 786 - Forks: 149

kyegomez/Kosmos-X
The Next Generation Multi-Modality Superintelligence
Language: Python - Size: 21.4 MB - Last synced at: about 4 hours ago - Pushed at: about 1 year ago - Stars: 70 - Forks: 11

OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language: Python - Size: 38.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9,029 - Forks: 695

Tebmer/Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
Size: 18.6 MB - Last synced at: about 14 hours ago - Pushed at: 6 months ago - Stars: 1,158 - Forks: 68

zai-org/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language: Python - Size: 25.8 MB - Last synced at: about 11 hours ago - Pushed at: over 1 year ago - Stars: 6,658 - Forks: 437

zai-org/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language: Python - Size: 13.9 MB - Last synced at: about 11 hours ago - Pushed at: 6 months ago - Stars: 2,412 - Forks: 156

IntelLabs/fastRAG
Efficient Retrieval Augmentation and Generation Framework
Language: Python - Size: 20.4 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 1,657 - Forks: 154

Chiuqyan/arxiv-daily-test Fork of beiyuouo/arxiv-daily
🎓 Automatically Update Some Fields Papers Daily using Github Actions / 12th hours
Language: Python - Size: 52.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

BrainLesion/preprocessing
preprocessing tools for multi-modal 3D brain imaging
Language: C - Size: 1.19 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 20 - Forks: 6

tangxyw/RecSysPapers
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Language: Python - Size: 1.6 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,914 - Forks: 253

zjunlp/DeepKE
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
Language: Python - Size: 121 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 4,108 - Forks: 727

vercel/modelfusion
The TypeScript library for building AI applications.
Language: TypeScript - Size: 15.6 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 1,298 - Forks: 89

agentscope-ai/agentscope
AgentScope: Agent-Oriented Programming for Building LLM Applications
Language: Python - Size: 303 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8,138 - Forks: 510

AnswerDotAI/byaldi
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Language: Python - Size: 1.94 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 820 - Forks: 92

dvlab-research/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Language: Python - Size: 28.9 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 2,381 - Forks: 176

JuliaRobotics/Caesar.jl
Robust robotic localization and mapping, together with NavAbility(TM). Reach out to info@wherewhen.ai for help.
Language: Julia - Size: 40 MB - Last synced at: 2 days ago - Pushed at: 12 days ago - Stars: 196 - Forks: 32

TEN-framework/ten-framework
Open-source framework for conversational voice AI agents.
Language: C - Size: 105 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7,290 - Forks: 852

open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Language: Python - Size: 8.32 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,985 - Forks: 489

docarray/docarray
Represent, send, store and search multimodal data
Language: Python - Size: 242 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 3,098 - Forks: 234

activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Language: Python - Size: 65.5 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 8,791 - Forks: 675

modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Language: Python - Size: 437 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 5,087 - Forks: 267

kyegomez/zeta
Build high-performance AI models with modular building blocks
Language: Python - Size: 41.4 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 545 - Forks: 52

TuGraph-family/chat2graph
Chat2Graph: Graph Native Agentic System.
Language: Python - Size: 18 MB - Last synced at: 7 days ago - Pushed at: 17 days ago - Stars: 340 - Forks: 42

OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Jupyter Notebook - Size: 2.84 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 5,478 - Forks: 511

OpenBMB/MiniCPM-V
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and Video Understanding on Your Phone
Language: Python - Size: 472 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 20,324 - Forks: 1,488

marqo-ai/marqo
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Language: Python - Size: 80.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4,926 - Forks: 213

kyegomez/SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
Language: Python - Size: 2.42 MB - Last synced at: 6 days ago - Pushed at: 20 days ago - Stars: 118 - Forks: 14

MedMNIST/MedMNIST
[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification
Language: Python - Size: 13.6 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 1,234 - Forks: 183

Ruiyang-061X/Awesome-MLLM-Reasoning
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
Size: 7.81 KB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 9 - Forks: 0

Agora-Lab-AI/Atom
a suite of finetuned LLMs for atomically precise function calling 🧪
Language: Python - Size: 2.35 MB - Last synced at: 6 days ago - Pushed at: 20 days ago - Stars: 15 - Forks: 1

JuliaRobotics/IncrementalInference.jl
Clique recycling non-Gaussian (multi-modal) factor graph solver; also see Caesar.jl.
Language: Julia - Size: 6.68 MB - Last synced at: 2 days ago - Pushed at: 9 days ago - Stars: 74 - Forks: 20

lucidrains/DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Language: Python - Size: 13.5 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 5,626 - Forks: 644

modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
Language: Python - Size: 55.1 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 8,283 - Forks: 864

saforem2/mmm
Multi-Modal Modeling
Language: Python - Size: 366 KB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 6 - Forks: 0

RS2002/Skip-BART
Official Repository for The Paper, Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
Language: Python - Size: 7.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

pingcap/pytidb
TiDB AI SDK: Unified Multi-Modal Data Platform for AI Apps & Agents - https://pingcap.github.io/ai/
Language: Python - Size: 1.78 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 22 - Forks: 11

OpenMotionLab/MotionGPT3
MotionGPT3: Human Motion as a Second Modality, a MoT-based framework for unified motion understanding and generation
Language: Python - Size: 9.22 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 87 - Forks: 5

bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
Size: 58.4 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1,304 - Forks: 101

jokieleung/awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Size: 179 KB - Last synced at: about 17 hours ago - Pushed at: about 2 years ago - Stars: 665 - Forks: 94

kenshi7798/awesome-text-to-motion
🤖 Generate human motion from text with our surveys, datasets, and models, focusing on single-person scenarios for clearer analysis and application.
Language: TypeScript - Size: 4.51 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

jina-ai/jina-vdr Fork of illuin-tech/vidore-benchmark
Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval
Language: Python - Size: 2.99 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 28 - Forks: 1

WisconsinAIVision/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Language: Python - Size: 17.4 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 330 - Forks: 23

kyegomez/RT-2
Democratization of RT-2 "RT-2: New model translates vision and language into action"
Language: Python - Size: 2.59 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 499 - Forks: 65

quic/cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
Language: Jupyter Notebook - Size: 25.3 MB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 66 - Forks: 13

InternLM/InternEvo
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
Language: Python - Size: 6.79 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 404 - Forks: 70

harlanhong/ACTalker
ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
Language: Python - Size: 125 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 374 - Forks: 39

chandan1145/Cog
Tiny HTTP framework built on node:http
Language: TypeScript - Size: 177 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Language: Jupyter Notebook - Size: 399 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 4,240 - Forks: 363

tsinghua-fib-lab/SmartAgent
The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".
Size: 4.69 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 29 - Forks: 1

bayujawir/SmolVLM
SmolVLM 🐙: Ready-to-run SmolVLM2 Docker image with web UI and HTTP API for image-to-text and text-to-text tasks; offline-capable, low GPU needs (>=4GB VRAM).
Language: Python - Size: 1.62 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

CoderChen01/InterCLIP-MEP
Official repository of the paper "InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection"
Language: Python - Size: 2.46 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 14 - Forks: 0

souradipp76/MM-PoE
Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models
Language: Python - Size: 698 KB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 1

Zilize/awesome-text-to-motion
Text-driven human motion generation surveys, datasets and models.
Language: Python - Size: 127 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

zjysteven/VLM-Visualizer
Visualizing the attention of vision-language models
Language: Jupyter Notebook - Size: 3.4 MB - Last synced at: 21 days ago - Pushed at: 6 months ago - Stars: 220 - Forks: 15

neirzhei/ScreenScribe
Offline-first agent that generates spoken conversational on screen activity using a local multi-modal pipeline (Vision-LLM-TTS) with resource conscious architecture.
Language: Python - Size: 15.6 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

kyegomez/HLT
Implementation of the transformer from the paper: "Real-World Humanoid Locomotion with Reinforcement Learning"
Language: Python - Size: 2.18 MB - Last synced at: 15 days ago - Pushed at: 26 days ago - Stars: 47 - Forks: 6

AlphaPlusTT/DAOcc
DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction
Language: Python - Size: 1.51 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 72 - Forks: 3

kyegomez/MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
Language: Python - Size: 2.17 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 23 - Forks: 1

icon-lab/I2I-Mamba
Official implementation of I2I-Mamba, an image-to-image translation model based on selective state spaces
Language: Python - Size: 295 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 81 - Forks: 8

chaohaoyuan/PAAG
Source code for Annotation-guided Protein Design with Multi-Level Domain Alignment. (KDD 2025)
Language: Python - Size: 7.74 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 7 - Forks: 1

yshinya6/xbm
Code repository for "Explanation Bottleneck Models" (AAAI2025 Oral)
Language: Python - Size: 536 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 7 - Forks: 1

Ruiyang-061X/Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
Size: 381 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 52 - Forks: 0

kyegomez/awesome-robotic-foundation-models
A vast array of Multi-Modal Embodied Robotic Foundation Models!
Size: 22.5 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 26 - Forks: 1

kyegomez/AutoRT
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Language: Python - Size: 2.49 MB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 40 - Forks: 3

alawryaguila/multi-view-AE
Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.
Language: Python - Size: 3.14 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 53 - Forks: 5

Imageomics/naturelab
Bridging Digital and Natural Worlds at The Wilds
Size: 9.66 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 2

ShuchangYe-bib/SGSeg
[MICCAI 2024] Official code for "SGSeg: Enabling Text-free Inference in Language-guided Segmentation of Chest X-rays via Self-guidance" (Simplified Version)
Language: Python - Size: 205 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 3

ShallowU/VideoGuard
An AI-powered multi-modal content detection system for short videos. Detection across multiple categories (violence, adult content, smoking, etc.), and automated PDF report generation.
Language: Python - Size: 84.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

howard-hou/VisualRWKV
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.
Language: Python - Size: 14.1 MB - Last synced at: about 17 hours ago - Pushed at: 3 months ago - Stars: 233 - Forks: 18

awslabs/rhubarb
A Python framework for multi-modal document understanding with Amazon Bedrock
Language: Python - Size: 32 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 94 - Forks: 12

ashvardanian/usearch-images
Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"
Language: Python - Size: 10.5 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 50 - Forks: 5

kyegomez/qformer
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
Language: Python - Size: 2.19 MB - Last synced at: about 22 hours ago - Pushed at: 10 months ago - Stars: 42 - Forks: 1

zai-org/VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Language: Python - Size: 18.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 4,156 - Forks: 422

ThuCCSLab/FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
Language: Python - Size: 43.3 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 159 - Forks: 7

thu-ml/MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
Language: Python - Size: 15.8 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 156 - Forks: 10

InternRobotics/Aether
[ICCV 2025] Aether: Geometric-Aware Unified World Modeling
Language: Python - Size: 55.2 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 413 - Forks: 4

thu-ml/MLA-Trust
A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions through 34 interactive tasks
Language: Python - Size: 1.7 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 45 - Forks: 3

xuyang-liu16/GlobalCom2
🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
Language: Python - Size: 6.24 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 1

AlokTheDataGuy/internship_projects
Multiple chatbots and NLP-based projects completed during my internship. Each project demonstrates different aspects of AI application development, from text summarization to multilingual chatbots.
Language: Python - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

m2aia/m2aia
Mass spectrometry imaging applications for interactive analysis in MITK (M²aia)
Language: C++ - Size: 2.2 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 21 - Forks: 4

gmartins459/FastLongSpeech
Enhance long-speech processing with FastLongSpeech, a framework for Large Speech-Language Models. Explore our model and dataset on GitHub! 🚀📦
Language: Python - Size: 19.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

johndef64/mychatgpt
mychatgpt is a small and useful Python package that provides utils to create OpenAI's GPT conversational agents. This module allows users to have interactive chat with GPT models and keeps track of the chat history. Useful in Python projects as Copilot agent.
Language: Python - Size: 7.07 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

JerryX1110/awesome-rvos
Referring Video Object Segmentation / Multi-Object Tracking Repo
Language: Python - Size: 79.1 KB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 4

yasshrma/LMMS
Create music effortlessly with LMMS, the free open-source digital audio workstation. Enjoy MIDI support, VST plugins, and powerful beat creation tools. 🎶💻
Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

BioDT/bfm-model
Multi-modal Foundation Model for Biodiversity dynamics forecasting
Language: Python - Size: 30.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PKU-YuanGroup/MoE-LLaVA
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
Language: Python - Size: 16.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2,190 - Forks: 138

kyegomez/HRTX
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
Language: Python - Size: 2.2 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 15 - Forks: 3

kyegomez/MultiModal-ToT
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
Language: Python - Size: 81.2 MB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 16 - Forks: 2

kyegomez/MegaVIT
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
Language: Python - Size: 211 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 29 - Forks: 1

kyegomez/Qwen-VL
My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't released model code yet sooo...
Language: Python - Size: 244 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 2

liuyang-ict/awesome-visual-transformers
[TNNLS] A Comprehensive Survey of Awesome Visual Transformer Literatures.
Size: 570 KB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 263 - Forks: 27

RasmussenLab/MOVE
MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations
Language: Jupyter Notebook - Size: 540 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 81 - Forks: 28

Huynwtrnaa/TEN
AI-powered platform for startup founders, offering insights and direction. Navigate your entrepreneurial journey with TEN. 🚀🌐
Language: Vue - Size: 39.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Language: Python - Size: 113 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 3,294 - Forks: 235

SafeRL-Lab/m4r
🔥 Measuring Massive Multimodal Understanding and Reasoning in Open Space
Language: Python - Size: 39.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

Event-AHU/Mamba_FETrack
[PRCV-2024] State Space Model based Frame-Event Tracking
Language: Python - Size: 3.72 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 38 - Forks: 2
