Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multi-modal

iSiddharth20/Chat-With-OPD Fork of GargiBhise/Chat-With-OPD

Offline Multi-Modal RAG. Execution Scripts optimized for for Intel, CUDA.

Language: Python - Size: 11.7 KB - Last synced: about 13 hours ago - Pushed: about 14 hours ago - Stars: 0 - Forks: 0

OpenGVLab/InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language: Jupyter Notebook - Size: 21.2 MB - Last synced: about 15 hours ago - Pushed: about 15 hours ago - Stars: 2,349 - Forks: 155

SciSharp/LLamaSharp

A C#/.NET library to run LLM models (🦙LLaMA/LLaVA) on your local device efficiently.

Language: C# - Size: 256 MB - Last synced: about 24 hours ago - Pushed: 1 day ago - Stars: 2,027 - Forks: 273

awslabs/rhubarb

A Python framework for multi-modal document understanding with Amazon Bedrock

Language: Python - Size: 16 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 22 - Forks: 2

Kav-K/GPTDiscord

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

Language: Python - Size: 1.76 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 1,795 - Forks: 298

THUDM/CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language: Python - Size: 25.8 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 5,221 - Forks: 368

Expl0dingCat/Ame

State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is under active development.

Language: Python - Size: 193 KB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 39 - Forks: 6

docarray/docarray

Represent, send, store and search multimodal data

Language: Python - Size: 242 MB - Last synced: 2 days ago - Pushed: 13 days ago - Stars: 2,783 - Forks: 221

RasmussenLab/MOVE

MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations

Language: Jupyter Notebook - Size: 540 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 59 - Forks: 22

IntelLabs/fastRAG

Efficient Retrieval Augmentation and Generation Framework

Language: Python - Size: 20.4 MB - Last synced: 2 days ago - Pushed: 4 days ago - Stars: 936 - Forks: 75

PKU-YuanGroup/MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language: Python - Size: 16.5 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 1,716 - Forks: 98

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Language: Python - Size: 52.5 MB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 6,146 - Forks: 649

PKU-YuanGroup/Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language: Python - Size: 112 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 2,469 - Forks: 183

jokieleung/awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Size: 179 KB - Last synced: about 7 hours ago - Pushed: 11 months ago - Stars: 644 - Forks: 95

xieyuquanxx/awesome-Large-MultiModal-Hallucination

😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.

Size: 66.4 KB - Last synced: 1 day ago - Pushed: about 2 months ago - Stars: 116 - Forks: 12

ika-rwth-aachen/MultiCorrupt

MultiCorrupt: A benchmark for robust multi-modal 3D object detection, evaluating LiDAR-Camera fusion models in autonomous driving. Includes diverse corruption types (e.g., misalignment, miscalibration, weather) and severity levels. Assess model performance under challenging conditions.

Language: Python - Size: 134 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 25 - Forks: 4

DirtyHarryLYL/Transformer-in-Vision

Recent Transformer-based CV and related works.

Size: 1.84 MB - Last synced: 4 days ago - Pushed: 9 months ago - Stars: 1,295 - Forks: 141

liuyang-ict/awesome-visual-transformers

[TNNLS] A Comprehensive Survey of Awesome Visual Transformer Literatures.

Size: 570 KB - Last synced: 2 days ago - Pushed: about 1 year ago - Stars: 232 - Forks: 21

MedMNIST/MedMNIST

[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification

Language: Python - Size: 13.6 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 977 - Forks: 154

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

Language: Python - Size: 1.48 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 424 - Forks: 46

QIN2DIM/hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

Language: Python - Size: 67.9 MB - Last synced: 5 days ago - Pushed: 29 days ago - Stars: 1,420 - Forks: 255

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language: Python - Size: 2.49 MB - Last synced: 5 days ago - Pushed: 6 months ago - Stars: 3,701 - Forks: 402

924973292/TOP-ReID

【AAAI2024】TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation

Language: Python - Size: 12.4 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 31 - Forks: 2

kyegomez/M2PT

Implementation of M2PT in PyTorch from the paper: "Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities"

Language: Python - Size: 2.66 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 11 - Forks: 1

THUDM/VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Language: Python - Size: 18.1 MB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 3,965 - Forks: 409

kyegomez/TinyGPTV

Simple Implementation of TinyGPTV in super simple Zeta lego blocks

Language: Python - Size: 2.17 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 15 - Forks: 0

colurw/temporal_CNN

Time-series prediction using a multi-modal 1D Convolutional Neural Network

Language: Jupyter Notebook - Size: 22.7 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0

kyegomez/SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

Language: Python - Size: 2.43 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 19 - Forks: 2

924973292/EDITOR

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Language: Python - Size: 10.6 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 41 - Forks: 4

lanl/EPBD-BERT

Transcription factor binding site prediction for novel DNA sequence data aiding in mutation identification and drug discovery

Language: Jupyter Notebook - Size: 3.81 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0

kyegomez/AutoRT

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

Language: Python - Size: 2.5 MB - Last synced: 6 days ago - Pushed: 2 months ago - Stars: 28 - Forks: 2

kyegomez/LUMIERE

Implementation of the text to video model LUMIERE from the paper: "A Space-Time Diffusion Model for Video Generation" by Google Research

Language: Python - Size: 2.18 MB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 46 - Forks: 2

quic/cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.

Language: Jupyter Notebook - Size: 8.24 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 36 - Forks: 4

kyegomez/VisionLLaMA

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta

Language: Python - Size: 2.19 MB - Last synced: 12 days ago - Pushed: 2 months ago - Stars: 13 - Forks: 0

Event-AHU/COESOT

A large-scale benchmark dataset for color-event based visual tracking

Language: Python - Size: 39.1 MB - Last synced: 7 days ago - Pushed: 21 days ago - Stars: 39 - Forks: 2

asvegah/robopilot

Live Dense Multi Modal 3D Mapping — A robotic and autonomous system designed for real time 3D reconstruction using a fusion of multiple depth and camera sensors simultaneously at real time speed

Language: Python - Size: 4.57 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 7 - Forks: 2

kyegomez/GATS

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta

Language: Python - Size: 2.17 MB - Last synced: 13 days ago - Pushed: 2 months ago - Stars: 8 - Forks: 0

yisun98/SOLC

Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类

Language: Python - Size: 6.79 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 137 - Forks: 19

kyegomez/qformer

Implementation of Qformer from BLIP2 in Zeta Lego blocks.

Language: Python - Size: 2.19 MB - Last synced: 6 days ago - Pushed: 2 months ago - Stars: 17 - Forks: 0

ashvardanian/TenPack

Fast Tensors Packaging library for text, image, video, and audio data compatible with PyTorch, TensorFlow, & NumPy 🖼️🎵🎥 ➡️ 🧠

Language: C++ - Size: 108 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 6 - Forks: 0

ThuCCSLab/FigStep

Jailbreaking Large Vision-language Models via Typographic Visual Prompts

Language: Python - Size: 43.2 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 52 - Forks: 4

kyegomez/CELESTIAL-1

Omni-Modality Processing, Understanding, and Generation

Language: Python - Size: 2.49 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 6 - Forks: 0

WisconsinAIVision/ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Language: Python - Size: 17.4 MB - Last synced: 15 days ago - Pushed: 18 days ago - Stars: 165 - Forks: 8

lucidrains/DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Language: Python - Size: 13.5 MB - Last synced: 16 days ago - Pushed: 3 months ago - Stars: 5,493 - Forks: 638

kyegomez/Simba

A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series"

Language: Python - Size: 2.48 MB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 17 - Forks: 2

kyegomez/MC-ViT

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

Language: Python - Size: 2.17 MB - Last synced: 7 days ago - Pushed: 2 months ago - Stars: 12 - Forks: 0

EndlessSora/TSIT

[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation

Language: Python - Size: 6.24 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 272 - Forks: 33

kyegomez/HSSS

Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling"

Language: Python - Size: 2.19 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 11 - Forks: 1

kyegomez/MM1

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"

Language: Python - Size: 2.2 MB - Last synced: 22 days ago - Pushed: 23 days ago - Stars: 18 - Forks: 1

v-iashin/Synchformer

Efficient synchronization from sparse cues

Language: Python - Size: 92.9 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 13 - Forks: 3

vercel/modelfusion

The TypeScript library for building AI applications.

Language: TypeScript - Size: 16 MB - Last synced: 25 days ago - Pushed: 26 days ago - Stars: 889 - Forks: 65

garethjns/MSIModels

Exploring multi-sensory integration and decision making in biologically inspired deep neural networks.

Language: Python - Size: 46 MB - Last synced: 24 days ago - Pushed: over 1 year ago - Stars: 3 - Forks: 2

JuliaRobotics/Caesar.jl

Robust robotic localization and mapping, together with NavAbility(TM). Reach out to [email protected] for help.

Language: Julia - Size: 40.1 MB - Last synced: 17 days ago - Pushed: 24 days ago - Stars: 179 - Forks: 31

wangxiao5791509/MultiModal_BigModels_Survey

[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models

Size: 12.3 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 250 - Forks: 16

Dco-ai/php-jina

A PHP Client for the Jina-AI Framework

Language: PHP - Size: 52.7 KB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 2 - Forks: 1

wangsuzhen/Audio2Head

code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021

Language: Python - Size: 1.02 MB - Last synced: 25 days ago - Pushed: 3 months ago - Stars: 291 - Forks: 54

mlvlab/Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Language: Python - Size: 1.23 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 54 - Forks: 7

mlvlab/OVQA

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Language: Python - Size: 619 KB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 13 - Forks: 0

mlvlab/MELTR

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

Language: Python - Size: 1.13 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 30 - Forks: 6

kyegomez/zeta

Build high-performance AI models with modular building blocks

Language: Python - Size: 27 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 230 - Forks: 22

alawryaguila/multi-view-AE

Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.

Language: Python - Size: 3.14 MB - Last synced: 3 days ago - Pushed: 3 months ago - Stars: 39 - Forks: 3

gangula-karthik/AICU-BIKE-SEARCH

Find Your Stolen Bike Lah! With AICU, We Kena Spot Your Bicycle on Carousell One Shot 🚲🔍💨

Language: Jupyter Notebook - Size: 27 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 3 - Forks: 0

valhalla/valhalla

Open Source Routing Engine for OpenStreetMap

Language: C++ - Size: 112 MB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 4,174 - Forks: 648

ailab-kyunghee/CM2_DVC

[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Language: Python - Size: 119 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0

kyegomez/MultiModal-ToT

Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement

Language: Python - Size: 81.2 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 10 - Forks: 2

kyegomez/forest-of-thoughts

A forest of autonomous agents.

Language: Python - Size: 249 KB - Last synced: 7 days ago - Pushed: 2 months ago - Stars: 14 - Forks: 1

kyegomez/HRTX

Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

Language: Python - Size: 2.19 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 15 - Forks: 2

clin1223/VLDet

[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

Language: Python - Size: 1.56 MB - Last synced: 27 days ago - Pushed: about 2 months ago - Stars: 169 - Forks: 10

kyegomez/HLT

Implementation of the transformer from the paper: "Real-World Humanoid Locomotion with Reinforcement Learning"

Language: Python - Size: 2.17 MB - Last synced: 15 days ago - Pushed: 2 months ago - Stars: 16 - Forks: 3

RUCAIBox/PLMPapers Fork of wxl1999/PLMPapers

A paper list of pre-trained language models (PLMs).

Size: 24.4 KB - Last synced: 26 days ago - Pushed: over 2 years ago - Stars: 134 - Forks: 18

Ji-eun-Kim/Translate-phrases-in-images-and-apply-original-styles

이미지 내 문구 번역 및 원본 스타일 적용 | [인공지능학회] X:AI | 📕 Toy project

Language: Python - Size: 25.1 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

iflytek/VLE

VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)

Language: Python - Size: 10.2 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 171 - Forks: 11

modelscope/agentscope

Start building LLM-empowered multi-agent applications in an easier way.

Language: Python - Size: 44.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 427 - Forks: 61

bytedance/SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Language: Python - Size: 6.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 785 - Forks: 53

zjukg/NATIVE

[Paper][SIGIR 2024] NativE: Multi-modal Knowledge Graph Completion in the Wild

Language: Python - Size: 10.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6 - Forks: 0

marqo-ai/marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Language: Python - Size: 68.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 4,085 - Forks: 171

zjunlp/DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Language: Python - Size: 110 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2,876 - Forks: 625

parsa-ra/LatentPlayInterface

A practice to handle multi-modal datasets in a unified way.

Language: Python - Size: 3.11 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

kyegomez/awesome-robotic-foundation-models

A vast array of Multi-Modal Embodied Robotic Foundation Models!

Size: 22.5 KB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 15 - Forks: 1

microsoft/farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

Language: Jupyter Notebook - Size: 40.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 624 - Forks: 100

kyegomez/Kosmos-X

The Next Generation Multi-Modality Superintelligence

Language: Python - Size: 21.5 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 65 - Forks: 10

Tebmer/Awesome-Knowledge-Distillation-of-LLMs

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

Size: 18 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 205 - Forks: 15

kyegomez/Fuyu

Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch

Language: Python - Size: 403 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 19 - Forks: 3

MIMBCD-UI/prototype-multi-modality-assistant 📦

An assistant prototype for breast cancer diagnosis prepared with a multimodality strategy.

Language: JavaScript - Size: 1.4 MB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 0 - Forks: 1

kyegomez/RT-2

Democratization of RT-2 "RT-2: New model translates vision and language into action"

Language: Python - Size: 2.59 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 262 - Forks: 37

SMIL-SPCRAS/DAVIS

Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024

Language: JavaScript - Size: 5.82 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6 - Forks: 0

JerryX1110/awesome-rvos

Referring Video Object Segmentation / Multi-Object Tracking Repo

Language: Python - Size: 79.1 KB - Last synced: 3 days ago - Pushed: 10 months ago - Stars: 80 - Forks: 4

dvlab-research/LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language: Python - Size: 27.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,408 - Forks: 93

nc-ai/MultimodalSum

[ACL-IJCNLP 2021] Self-Supervised Multimodal Opinion Summarization

Language: Python - Size: 2.69 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 23 - Forks: 4

higotenda/kurt

A framework for multi-modal media summarization.

Language: Python - Size: 84.8 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 2 - Forks: 2

lyyf2002/ASGEA

Code for ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity Alignment

Language: Python - Size: 3.11 MB - Last synced: 5 days ago - Pushed: 3 months ago - Stars: 10 - Forks: 1

zjukg/AdaMF-MAT

[Paper][LREC-COLING 2024] Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion

Language: Python - Size: 1.91 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11 - Forks: 1

wzongyu/LLM-and-Multimodal-Paper-List

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

Size: 103 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 25 - Forks: 2

kyegomez/MLXTransformer

Simple Implementation of a Transformer in the new framework MLX by Apple

Language: Python - Size: 2.18 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 14 - Forks: 0

salesforce/UniControl

Unified Controllable Visual Generation Model

Language: Python - Size: 145 MB - Last synced: about 1 month ago - Pushed: 6 months ago - Stars: 574 - Forks: 31

deep-symbolic-mathematics/Multimodal-Math-Pretraining

[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"

Language: Python - Size: 962 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 25 - Forks: 1

BoHuangLab/CELL-E_2

Encoder-only model for image-based protein predictions

Language: Python - Size: 12.9 MB - Last synced: 26 days ago - Pushed: 5 months ago - Stars: 8 - Forks: 0

seungheondoh/audio-lyrics-emotion-recognition 📦

(Unofficial) Pytorch Implementation of Music Mood Detection Based On Audio And Lyrics With Deep Neural Net

Language: Python - Size: 17.7 MB - Last synced: 5 days ago - Pushed: over 4 years ago - Stars: 88 - Forks: 22

PKU-YuanGroup/LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language: Python - Size: 18.6 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 474 - Forks: 34

kyegomez/NeVA

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

Language: Python - Size: 253 KB - Last synced: 20 days ago - Pushed: 9 months ago - Stars: 16 - Forks: 1

Related Keywords
multi-modal 248 deep-learning 44 pytorch 35 artificial-intelligence 30 machine-learning 30 ai 29 transformers 22 transformer 21 gpt4 19 llm 18 computer-vision 17 ml 16 clip 15 multi-modality 15 large-language-models 14 attention 12 multi-modal-learning 11 nlp 11 attention-mechanism 11 chatbot 11 gpt 11 attention-is-all-you-need 9 multimodal 8 python 8 llama 8 robotics 8 tensorflow 8 knowledge-graph 8 gpt-4 7 object-detection 7 audio 6 instruction-tuning 6 open-source 6 language-model 6 natural-language-processing 5 dataset 5 chatgpt 5 text-to-image 5 gan 5 semantic-search 5 image-generation 5 3d 5 image-to-image-translation 5 point-cloud 5 neural-network 5 lidar 4 pretraining 4 contrastive-learning 4 multi-view 4 visual-question-answering 4 music 4 benchmark 4 cross-modal 4 video 4 segmentation 4 audio-visual 4 lstm 4 llama2 4 gemini 4 llava 4 openai 4 cnn 4 stable-diffusion 4 vision-language 4 streamlit 4 vision-and-language 4 vqa 4 variational-autoencoder 3 zero-shot 3 robot-learning 3 python3 3 vit 3 medical-imaging 3 information-retrieval 3 self-attention 3 deep-neural-networks 3 question-answering 3 image-synthesis 3 pre-training 3 cv 3 speech 3 alignment 3 distillation 3 embeddings 3 image-registration 3 multi-modal-fusion 3 knowledge-graph-completion 3 semantic-image-synthesis 3 generative-ai 3 llamacpp 3 vision 3 vision-language-model 3 video-classification 3 video-understanding 3 image-classification 3 representation-learning 3 vector-database 3 image 3 inference 3 scene-understanding 3