An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multi-modal

valhalla/valhalla

Open Source Routing Engine for OpenStreetMap

Language: C++ - Size: 119 MB - Last synced at: about 10 hours ago - Pushed at: about 12 hours ago - Stars: 5,042 - Forks: 762

presidio-oss/cline-based-code-generator

VS Code extension that streamlines development workflows through AI-powered task execution, intelligent file management, and automated code generation. Built on Cline, it integrates with various LLMs to enhance productivity and code quality while simplifying complex development tasks.

Language: TypeScript - Size: 110 MB - Last synced at: about 12 hours ago - Pushed at: about 14 hours ago - Stars: 62 - Forks: 50

jeremy-london/SnowRivals

SnowRivals: AI-Powered Snowboarding Coach

Language: TypeScript - Size: 4.05 MB - Last synced at: about 14 hours ago - Pushed at: about 16 hours ago - Stars: 0 - Forks: 1

SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Language: C# - Size: 393 MB - Last synced at: about 11 hours ago - Pushed at: 6 days ago - Stars: 3,347 - Forks: 465

microsoft/farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

Language: Jupyter Notebook - Size: 40 MB - Last synced at: about 9 hours ago - Pushed at: about 1 month ago - Stars: 786 - Forks: 149

kyegomez/Kosmos-X

The Next Generation Multi-Modality Superintelligence

Language: Python - Size: 21.4 MB - Last synced at: about 4 hours ago - Pushed at: about 1 year ago - Stars: 70 - Forks: 11

OpenGVLab/InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language: Python - Size: 38.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9,029 - Forks: 695

Tebmer/Awesome-Knowledge-Distillation-of-LLMs

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

Size: 18.6 MB - Last synced at: about 14 hours ago - Pushed at: 6 months ago - Stars: 1,158 - Forks: 68

zai-org/CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language: Python - Size: 25.8 MB - Last synced at: about 11 hours ago - Pushed at: over 1 year ago - Stars: 6,658 - Forks: 437

zai-org/CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language: Python - Size: 13.9 MB - Last synced at: about 11 hours ago - Pushed at: 6 months ago - Stars: 2,412 - Forks: 156

IntelLabs/fastRAG

Efficient Retrieval Augmentation and Generation Framework

Language: Python - Size: 20.4 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 1,657 - Forks: 154

Chiuqyan/arxiv-daily-test Fork of beiyuouo/arxiv-daily

🎓 Automatically Update Some Fields Papers Daily using Github Actions / 12th hours

Language: Python - Size: 52.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

BrainLesion/preprocessing

preprocessing tools for multi-modal 3D brain imaging

Language: C - Size: 1.19 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 20 - Forks: 6

tangxyw/RecSysPapers

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Language: Python - Size: 1.6 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,914 - Forks: 253

zjunlp/DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Language: Python - Size: 121 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 4,108 - Forks: 727

vercel/modelfusion

The TypeScript library for building AI applications.

Language: TypeScript - Size: 15.6 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 1,298 - Forks: 89

agentscope-ai/agentscope

AgentScope: Agent-Oriented Programming for Building LLM Applications

Language: Python - Size: 303 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8,138 - Forks: 510

AnswerDotAI/byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

Language: Python - Size: 1.94 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 820 - Forks: 92

dvlab-research/LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language: Python - Size: 28.9 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 2,381 - Forks: 176

JuliaRobotics/Caesar.jl

Robust robotic localization and mapping, together with NavAbility(TM). Reach out to info@wherewhen.ai for help.

Language: Julia - Size: 40 MB - Last synced at: 2 days ago - Pushed at: 12 days ago - Stars: 196 - Forks: 32

TEN-framework/ten-framework

Open-source framework for conversational voice AI agents.

Language: C - Size: 105 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7,290 - Forks: 852

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Language: Python - Size: 8.32 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,985 - Forks: 489

docarray/docarray

Represent, send, store and search multimodal data

Language: Python - Size: 242 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 3,098 - Forks: 234

activeloopai/deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Language: Python - Size: 65.5 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 8,791 - Forks: 675

modelscope/data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Language: Python - Size: 437 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 5,087 - Forks: 267

kyegomez/zeta

Build high-performance AI models with modular building blocks

Language: Python - Size: 41.4 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 545 - Forks: 52

TuGraph-family/chat2graph

Chat2Graph: Graph Native Agentic System.

Language: Python - Size: 18 MB - Last synced at: 7 days ago - Pushed at: 17 days ago - Stars: 340 - Forks: 42

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language: Jupyter Notebook - Size: 2.84 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 5,478 - Forks: 511

OpenBMB/MiniCPM-V

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and Video Understanding on Your Phone

Language: Python - Size: 472 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 20,324 - Forks: 1,488

marqo-ai/marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Language: Python - Size: 80.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4,926 - Forks: 213

kyegomez/SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

Language: Python - Size: 2.42 MB - Last synced at: 6 days ago - Pushed at: 20 days ago - Stars: 118 - Forks: 14

MedMNIST/MedMNIST

[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification

Language: Python - Size: 13.6 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 1,234 - Forks: 183

Ruiyang-061X/Awesome-MLLM-Reasoning

📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.

Size: 7.81 KB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 9 - Forks: 0

Agora-Lab-AI/Atom

a suite of finetuned LLMs for atomically precise function calling 🧪

Language: Python - Size: 2.35 MB - Last synced at: 6 days ago - Pushed at: 20 days ago - Stars: 15 - Forks: 1

JuliaRobotics/IncrementalInference.jl

Clique recycling non-Gaussian (multi-modal) factor graph solver; also see Caesar.jl.

Language: Julia - Size: 6.68 MB - Last synced at: 2 days ago - Pushed at: 9 days ago - Stars: 74 - Forks: 20

lucidrains/DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Language: Python - Size: 13.5 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 5,626 - Forks: 644

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Language: Python - Size: 55.1 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 8,283 - Forks: 864

saforem2/mmm

Multi-Modal Modeling

Language: Python - Size: 366 KB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 6 - Forks: 0

RS2002/Skip-BART

Official Repository for The Paper, Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?

Language: Python - Size: 7.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

pingcap/pytidb

TiDB AI SDK: Unified Multi-Modal Data Platform for AI Apps & Agents - https://pingcap.github.io/ai/

Language: Python - Size: 1.78 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 22 - Forks: 11

OpenMotionLab/MotionGPT3

MotionGPT3: Human Motion as a Second Modality, a MoT-based framework for unified motion understanding and generation

Language: Python - Size: 9.22 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 87 - Forks: 5

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

Size: 58.4 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1,304 - Forks: 101

jokieleung/awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Size: 179 KB - Last synced at: about 17 hours ago - Pushed at: about 2 years ago - Stars: 665 - Forks: 94

kenshi7798/awesome-text-to-motion

🤖 Generate human motion from text with our surveys, datasets, and models, focusing on single-person scenarios for clearer analysis and application.

Language: TypeScript - Size: 4.51 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

jina-ai/jina-vdr Fork of illuin-tech/vidore-benchmark

Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval

Language: Python - Size: 2.99 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 28 - Forks: 1

WisconsinAIVision/ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Language: Python - Size: 17.4 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 330 - Forks: 23

kyegomez/RT-2

Democratization of RT-2 "RT-2: New model translates vision and language into action"

Language: Python - Size: 2.59 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 499 - Forks: 65

quic/cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.

Language: Jupyter Notebook - Size: 25.3 MB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 66 - Forks: 13

InternLM/InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Language: Python - Size: 6.79 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 404 - Forks: 70

harlanhong/ACTalker

ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).

Language: Python - Size: 125 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 374 - Forks: 39

chandan1145/Cog

Tiny HTTP framework built on node:http

Language: TypeScript - Size: 177 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

VectorSpaceLab/OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Language: Jupyter Notebook - Size: 399 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 4,240 - Forks: 363

tsinghua-fib-lab/SmartAgent

The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".

Size: 4.69 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 29 - Forks: 1

bayujawir/SmolVLM

SmolVLM 🐙: Ready-to-run SmolVLM2 Docker image with web UI and HTTP API for image-to-text and text-to-text tasks; offline-capable, low GPU needs (>=4GB VRAM).

Language: Python - Size: 1.62 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

CoderChen01/InterCLIP-MEP

Official repository of the paper "InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection"

Language: Python - Size: 2.46 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 14 - Forks: 0

souradipp76/MM-PoE

Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models

Language: Python - Size: 698 KB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 1

Zilize/awesome-text-to-motion

Text-driven human motion generation surveys, datasets and models.

Language: Python - Size: 127 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

zjysteven/VLM-Visualizer

Visualizing the attention of vision-language models

Language: Jupyter Notebook - Size: 3.4 MB - Last synced at: 21 days ago - Pushed at: 6 months ago - Stars: 220 - Forks: 15

neirzhei/ScreenScribe

Offline-first agent that generates spoken conversational on screen activity using a local multi-modal pipeline (Vision-LLM-TTS) with resource conscious architecture.

Language: Python - Size: 15.6 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

kyegomez/HLT

Implementation of the transformer from the paper: "Real-World Humanoid Locomotion with Reinforcement Learning"

Language: Python - Size: 2.18 MB - Last synced at: 15 days ago - Pushed at: 26 days ago - Stars: 47 - Forks: 6

AlphaPlusTT/DAOcc

DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

Language: Python - Size: 1.51 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 72 - Forks: 3

kyegomez/MC-ViT

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

Language: Python - Size: 2.17 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 23 - Forks: 1

icon-lab/I2I-Mamba

Official implementation of I2I-Mamba, an image-to-image translation model based on selective state spaces

Language: Python - Size: 295 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 81 - Forks: 8

chaohaoyuan/PAAG

Source code for Annotation-guided Protein Design with Multi-Level Domain Alignment. (KDD 2025)

Language: Python - Size: 7.74 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 7 - Forks: 1

yshinya6/xbm

Code repository for "Explanation Bottleneck Models" (AAAI2025 Oral)

Language: Python - Size: 536 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 7 - Forks: 1

Ruiyang-061X/Awesome-MLLM-Uncertainty

✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).

Size: 381 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 52 - Forks: 0

kyegomez/awesome-robotic-foundation-models

A vast array of Multi-Modal Embodied Robotic Foundation Models!

Size: 22.5 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 26 - Forks: 1

kyegomez/AutoRT

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

Language: Python - Size: 2.49 MB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 40 - Forks: 3

alawryaguila/multi-view-AE

Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.

Language: Python - Size: 3.14 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 53 - Forks: 5

Imageomics/naturelab

Bridging Digital and Natural Worlds at The Wilds

Size: 9.66 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 2

ShuchangYe-bib/SGSeg

[MICCAI 2024] Official code for "SGSeg: Enabling Text-free Inference in Language-guided Segmentation of Chest X-rays via Self-guidance" (Simplified Version)

Language: Python - Size: 205 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 3

ShallowU/VideoGuard

An AI-powered multi-modal content detection system for short videos. Detection across multiple categories (violence, adult content, smoking, etc.), and automated PDF report generation.

Language: Python - Size: 84.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

howard-hou/VisualRWKV

VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.

Language: Python - Size: 14.1 MB - Last synced at: about 17 hours ago - Pushed at: 3 months ago - Stars: 233 - Forks: 18

awslabs/rhubarb

A Python framework for multi-modal document understanding with Amazon Bedrock

Language: Python - Size: 32 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 94 - Forks: 12

ashvardanian/usearch-images

Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"

Language: Python - Size: 10.5 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 50 - Forks: 5

kyegomez/qformer

Implementation of Qformer from BLIP2 in Zeta Lego blocks.

Language: Python - Size: 2.19 MB - Last synced at: about 22 hours ago - Pushed at: 10 months ago - Stars: 42 - Forks: 1

zai-org/VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Language: Python - Size: 18.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 4,156 - Forks: 422

ThuCCSLab/FigStep

[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts

Language: Python - Size: 43.3 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 159 - Forks: 7

thu-ml/MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)

Language: Python - Size: 15.8 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 156 - Forks: 10

InternRobotics/Aether

[ICCV 2025] Aether: Geometric-Aware Unified World Modeling

Language: Python - Size: 55.2 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 413 - Forks: 4

thu-ml/MLA-Trust

A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions through 34 interactive tasks

Language: Python - Size: 1.7 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 45 - Forks: 3

xuyang-liu16/GlobalCom2

🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Language: Python - Size: 6.24 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 1

AlokTheDataGuy/internship_projects

Multiple chatbots and NLP-based projects completed during my internship. Each project demonstrates different aspects of AI application development, from text summarization to multilingual chatbots.

Language: Python - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

m2aia/m2aia

Mass spectrometry imaging applications for interactive analysis in MITK (M²aia)

Language: C++ - Size: 2.2 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 21 - Forks: 4

gmartins459/FastLongSpeech

Enhance long-speech processing with FastLongSpeech, a framework for Large Speech-Language Models. Explore our model and dataset on GitHub! 🚀📦

Language: Python - Size: 19.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

johndef64/mychatgpt

mychatgpt is a small and useful Python package that provides utils to create OpenAI's GPT conversational agents. This module allows users to have interactive chat with GPT models and keeps track of the chat history. Useful in Python projects as Copilot agent.

Language: Python - Size: 7.07 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

JerryX1110/awesome-rvos

Referring Video Object Segmentation / Multi-Object Tracking Repo

Language: Python - Size: 79.1 KB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 88 - Forks: 4

yasshrma/LMMS

Create music effortlessly with LMMS, the free open-source digital audio workstation. Enjoy MIDI support, VST plugins, and powerful beat creation tools. 🎶💻

Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

BioDT/bfm-model

Multi-modal Foundation Model for Biodiversity dynamics forecasting

Language: Python - Size: 30.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PKU-YuanGroup/MoE-LLaVA

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

Language: Python - Size: 16.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2,190 - Forks: 138

kyegomez/HRTX

Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

Language: Python - Size: 2.2 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 15 - Forks: 3

kyegomez/MultiModal-ToT

Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement

Language: Python - Size: 81.2 MB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 16 - Forks: 2

kyegomez/MegaVIT

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

Language: Python - Size: 211 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 29 - Forks: 1

kyegomez/Qwen-VL

My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't released model code yet sooo...

Language: Python - Size: 244 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 2

liuyang-ict/awesome-visual-transformers

[TNNLS] A Comprehensive Survey of Awesome Visual Transformer Literatures.

Size: 570 KB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 263 - Forks: 27

RasmussenLab/MOVE

MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations

Language: Jupyter Notebook - Size: 540 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 81 - Forks: 28

Huynwtrnaa/TEN

AI-powered platform for startup founders, offering insights and direction. Navigate your entrepreneurial journey with TEN. 🚀🌐

Language: Vue - Size: 39.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

PKU-YuanGroup/Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language: Python - Size: 113 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 3,294 - Forks: 235

SafeRL-Lab/m4r

🔥 Measuring Massive Multimodal Understanding and Reasoning in Open Space

Language: Python - Size: 39.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

Event-AHU/Mamba_FETrack

[PRCV-2024] State Space Model based Frame-Event Tracking

Language: Python - Size: 3.72 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 38 - Forks: 2