Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: multi-modal
iSiddharth20/Chat-With-OPD Fork of GargiBhise/Chat-With-OPD
Offline Multi-Modal RAG. Execution Scripts optimized for for Intel, CUDA.
Language: Python - Size: 11.7 KB - Last synced: about 13 hours ago - Pushed: about 14 hours ago - Stars: 0 - Forks: 0
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
Language: Jupyter Notebook - Size: 21.2 MB - Last synced: about 15 hours ago - Pushed: about 15 hours ago - Stars: 2,349 - Forks: 155
SciSharp/LLamaSharp
A C#/.NET library to run LLM models (🦙LLaMA/LLaVA) on your local device efficiently.
Language: C# - Size: 256 MB - Last synced: about 24 hours ago - Pushed: 1 day ago - Stars: 2,027 - Forks: 273
awslabs/rhubarb
A Python framework for multi-modal document understanding with Amazon Bedrock
Language: Python - Size: 16 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 22 - Forks: 2
Kav-K/GPTDiscord
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
Language: Python - Size: 1.76 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 1,795 - Forks: 298
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language: Python - Size: 25.8 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 5,221 - Forks: 368
Expl0dingCat/Ame
State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is under active development.
Language: Python - Size: 193 KB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 39 - Forks: 6
docarray/docarray
Represent, send, store and search multimodal data
Language: Python - Size: 242 MB - Last synced: 2 days ago - Pushed: 13 days ago - Stars: 2,783 - Forks: 221
RasmussenLab/MOVE
MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations
Language: Jupyter Notebook - Size: 540 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 59 - Forks: 22
IntelLabs/fastRAG
Efficient Retrieval Augmentation and Generation Framework
Language: Python - Size: 20.4 MB - Last synced: 2 days ago - Pushed: 4 days ago - Stars: 936 - Forks: 75
PKU-YuanGroup/MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
Language: Python - Size: 16.5 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 1,716 - Forks: 98
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
Language: Python - Size: 52.5 MB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 6,146 - Forks: 649
PKU-YuanGroup/Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Language: Python - Size: 112 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 2,469 - Forks: 183
jokieleung/awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Size: 179 KB - Last synced: about 7 hours ago - Pushed: 11 months ago - Stars: 644 - Forks: 95
xieyuquanxx/awesome-Large-MultiModal-Hallucination
😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
Size: 66.4 KB - Last synced: 1 day ago - Pushed: about 2 months ago - Stars: 116 - Forks: 12
ika-rwth-aachen/MultiCorrupt
MultiCorrupt: A benchmark for robust multi-modal 3D object detection, evaluating LiDAR-Camera fusion models in autonomous driving. Includes diverse corruption types (e.g., misalignment, miscalibration, weather) and severity levels. Assess model performance under challenging conditions.
Language: Python - Size: 134 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 25 - Forks: 4
DirtyHarryLYL/Transformer-in-Vision
Recent Transformer-based CV and related works.
Size: 1.84 MB - Last synced: 4 days ago - Pushed: 9 months ago - Stars: 1,295 - Forks: 141
liuyang-ict/awesome-visual-transformers
[TNNLS] A Comprehensive Survey of Awesome Visual Transformer Literatures.
Size: 570 KB - Last synced: 2 days ago - Pushed: about 1 year ago - Stars: 232 - Forks: 21
MedMNIST/MedMNIST
[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification
Language: Python - Size: 13.6 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 977 - Forks: 154
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks
Language: Python - Size: 1.48 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 424 - Forks: 46
QIN2DIM/hcaptcha-challenger
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Language: Python - Size: 67.9 MB - Last synced: 5 days ago - Pushed: 29 days ago - Stars: 1,420 - Forks: 255
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Python - Size: 2.49 MB - Last synced: 5 days ago - Pushed: 6 months ago - Stars: 3,701 - Forks: 402
924973292/TOP-ReID
【AAAI2024】TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation
Language: Python - Size: 12.4 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 31 - Forks: 2
kyegomez/M2PT
Implementation of M2PT in PyTorch from the paper: "Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities"
Language: Python - Size: 2.66 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 11 - Forks: 1
THUDM/VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Language: Python - Size: 18.1 MB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 3,965 - Forks: 409
kyegomez/TinyGPTV
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
Language: Python - Size: 2.17 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 15 - Forks: 0
colurw/temporal_CNN
Time-series prediction using a multi-modal 1D Convolutional Neural Network
Language: Jupyter Notebook - Size: 22.7 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0
kyegomez/SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
Language: Python - Size: 2.43 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 19 - Forks: 2
924973292/EDITOR
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Language: Python - Size: 10.6 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 41 - Forks: 4
lanl/EPBD-BERT
Transcription factor binding site prediction for novel DNA sequence data aiding in mutation identification and drug discovery
Language: Jupyter Notebook - Size: 3.81 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0
kyegomez/AutoRT
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Language: Python - Size: 2.5 MB - Last synced: 6 days ago - Pushed: 2 months ago - Stars: 28 - Forks: 2
kyegomez/LUMIERE
Implementation of the text to video model LUMIERE from the paper: "A Space-Time Diffusion Model for Video Generation" by Google Research
Language: Python - Size: 2.18 MB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 46 - Forks: 2
quic/cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
Language: Jupyter Notebook - Size: 8.24 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 36 - Forks: 4
kyegomez/VisionLLaMA
Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta
Language: Python - Size: 2.19 MB - Last synced: 12 days ago - Pushed: 2 months ago - Stars: 13 - Forks: 0
Event-AHU/COESOT
A large-scale benchmark dataset for color-event based visual tracking
Language: Python - Size: 39.1 MB - Last synced: 7 days ago - Pushed: 21 days ago - Stars: 39 - Forks: 2
asvegah/robopilot
Live Dense Multi Modal 3D Mapping — A robotic and autonomous system designed for real time 3D reconstruction using a fusion of multiple depth and camera sensors simultaneously at real time speed
Language: Python - Size: 4.57 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 7 - Forks: 2
kyegomez/GATS
Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta
Language: Python - Size: 2.17 MB - Last synced: 13 days ago - Pushed: 2 months ago - Stars: 8 - Forks: 0
yisun98/SOLC
Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类
Language: Python - Size: 6.79 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 137 - Forks: 19
kyegomez/qformer
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
Language: Python - Size: 2.19 MB - Last synced: 6 days ago - Pushed: 2 months ago - Stars: 17 - Forks: 0
ashvardanian/TenPack
Fast Tensors Packaging library for text, image, video, and audio data compatible with PyTorch, TensorFlow, & NumPy 🖼️🎵🎥 ➡️ 🧠
Language: C++ - Size: 108 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 6 - Forks: 0
ThuCCSLab/FigStep
Jailbreaking Large Vision-language Models via Typographic Visual Prompts
Language: Python - Size: 43.2 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 52 - Forks: 4
kyegomez/CELESTIAL-1
Omni-Modality Processing, Understanding, and Generation
Language: Python - Size: 2.49 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 6 - Forks: 0
WisconsinAIVision/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Language: Python - Size: 17.4 MB - Last synced: 15 days ago - Pushed: 18 days ago - Stars: 165 - Forks: 8
lucidrains/DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Language: Python - Size: 13.5 MB - Last synced: 16 days ago - Pushed: 3 months ago - Stars: 5,493 - Forks: 638
kyegomez/Simba
A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series"
Language: Python - Size: 2.48 MB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 17 - Forks: 2
kyegomez/MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
Language: Python - Size: 2.17 MB - Last synced: 7 days ago - Pushed: 2 months ago - Stars: 12 - Forks: 0
EndlessSora/TSIT
[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation
Language: Python - Size: 6.24 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 272 - Forks: 33
kyegomez/HSSS
Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling"
Language: Python - Size: 2.19 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 11 - Forks: 1
kyegomez/MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
Language: Python - Size: 2.2 MB - Last synced: 22 days ago - Pushed: 23 days ago - Stars: 18 - Forks: 1
v-iashin/Synchformer
Efficient synchronization from sparse cues
Language: Python - Size: 92.9 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 13 - Forks: 3
vercel/modelfusion
The TypeScript library for building AI applications.
Language: TypeScript - Size: 16 MB - Last synced: 25 days ago - Pushed: 26 days ago - Stars: 889 - Forks: 65
garethjns/MSIModels
Exploring multi-sensory integration and decision making in biologically inspired deep neural networks.
Language: Python - Size: 46 MB - Last synced: 24 days ago - Pushed: over 1 year ago - Stars: 3 - Forks: 2
JuliaRobotics/Caesar.jl
Robust robotic localization and mapping, together with NavAbility(TM). Reach out to [email protected] for help.
Language: Julia - Size: 40.1 MB - Last synced: 17 days ago - Pushed: 24 days ago - Stars: 179 - Forks: 31
wangxiao5791509/MultiModal_BigModels_Survey
[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
Size: 12.3 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 250 - Forks: 16
Dco-ai/php-jina
A PHP Client for the Jina-AI Framework
Language: PHP - Size: 52.7 KB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 2 - Forks: 1
wangsuzhen/Audio2Head
code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021
Language: Python - Size: 1.02 MB - Last synced: 25 days ago - Pushed: 3 months ago - Stars: 291 - Forks: 54
mlvlab/Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Language: Python - Size: 1.23 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 54 - Forks: 7
mlvlab/OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)
Language: Python - Size: 619 KB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 13 - Forks: 0
mlvlab/MELTR
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
Language: Python - Size: 1.13 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 30 - Forks: 6
kyegomez/zeta
Build high-performance AI models with modular building blocks
Language: Python - Size: 27 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 230 - Forks: 22
alawryaguila/multi-view-AE
Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.
Language: Python - Size: 3.14 MB - Last synced: 3 days ago - Pushed: 3 months ago - Stars: 39 - Forks: 3
gangula-karthik/AICU-BIKE-SEARCH
Find Your Stolen Bike Lah! With AICU, We Kena Spot Your Bicycle on Carousell One Shot 🚲🔍💨
Language: Jupyter Notebook - Size: 27 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 3 - Forks: 0
valhalla/valhalla
Open Source Routing Engine for OpenStreetMap
Language: C++ - Size: 112 MB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 4,174 - Forks: 648
ailab-kyunghee/CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Language: Python - Size: 119 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0
kyegomez/MultiModal-ToT
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
Language: Python - Size: 81.2 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 10 - Forks: 2
kyegomez/forest-of-thoughts
A forest of autonomous agents.
Language: Python - Size: 249 KB - Last synced: 7 days ago - Pushed: 2 months ago - Stars: 14 - Forks: 1
kyegomez/HRTX
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
Language: Python - Size: 2.19 MB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 15 - Forks: 2
clin1223/VLDet
[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)
Language: Python - Size: 1.56 MB - Last synced: 27 days ago - Pushed: about 2 months ago - Stars: 169 - Forks: 10
kyegomez/HLT
Implementation of the transformer from the paper: "Real-World Humanoid Locomotion with Reinforcement Learning"
Language: Python - Size: 2.17 MB - Last synced: 15 days ago - Pushed: 2 months ago - Stars: 16 - Forks: 3
RUCAIBox/PLMPapers Fork of wxl1999/PLMPapers
A paper list of pre-trained language models (PLMs).
Size: 24.4 KB - Last synced: 26 days ago - Pushed: over 2 years ago - Stars: 134 - Forks: 18
Ji-eun-Kim/Translate-phrases-in-images-and-apply-original-styles
이미지 내 문구 번역 및 원본 스타일 적용 | [인공지능학회] X:AI | 📕 Toy project
Language: Python - Size: 25.1 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
iflytek/VLE
VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)
Language: Python - Size: 10.2 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 171 - Forks: 11
modelscope/agentscope
Start building LLM-empowered multi-agent applications in an easier way.
Language: Python - Size: 44.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 427 - Forks: 61
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Language: Python - Size: 6.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 785 - Forks: 53
zjukg/NATIVE
[Paper][SIGIR 2024] NativE: Multi-modal Knowledge Graph Completion in the Wild
Language: Python - Size: 10.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6 - Forks: 0
marqo-ai/marqo
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Language: Python - Size: 68.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 4,085 - Forks: 171
zjunlp/DeepKE
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
Language: Python - Size: 110 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2,876 - Forks: 625
parsa-ra/LatentPlayInterface
A practice to handle multi-modal datasets in a unified way.
Language: Python - Size: 3.11 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
kyegomez/awesome-robotic-foundation-models
A vast array of Multi-Modal Embodied Robotic Foundation Models!
Size: 22.5 KB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 15 - Forks: 1
microsoft/farmvibes-ai
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
Language: Jupyter Notebook - Size: 40.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 624 - Forks: 100
kyegomez/Kosmos-X
The Next Generation Multi-Modality Superintelligence
Language: Python - Size: 21.5 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 65 - Forks: 10
Tebmer/Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
Size: 18 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 205 - Forks: 15
kyegomez/Fuyu
Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch
Language: Python - Size: 403 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 19 - Forks: 3
MIMBCD-UI/prototype-multi-modality-assistant 📦
An assistant prototype for breast cancer diagnosis prepared with a multimodality strategy.
Language: JavaScript - Size: 1.4 MB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 0 - Forks: 1
kyegomez/RT-2
Democratization of RT-2 "RT-2: New model translates vision and language into action"
Language: Python - Size: 2.59 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 262 - Forks: 37
SMIL-SPCRAS/DAVIS
Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024
Language: JavaScript - Size: 5.82 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6 - Forks: 0
JerryX1110/awesome-rvos
Referring Video Object Segmentation / Multi-Object Tracking Repo
Language: Python - Size: 79.1 KB - Last synced: 3 days ago - Pushed: 10 months ago - Stars: 80 - Forks: 4
dvlab-research/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Language: Python - Size: 27.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,408 - Forks: 93
nc-ai/MultimodalSum
[ACL-IJCNLP 2021] Self-Supervised Multimodal Opinion Summarization
Language: Python - Size: 2.69 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 23 - Forks: 4
higotenda/kurt
A framework for multi-modal media summarization.
Language: Python - Size: 84.8 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 2 - Forks: 2
lyyf2002/ASGEA
Code for ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity Alignment
Language: Python - Size: 3.11 MB - Last synced: 5 days ago - Pushed: 3 months ago - Stars: 10 - Forks: 1
zjukg/AdaMF-MAT
[Paper][LREC-COLING 2024] Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion
Language: Python - Size: 1.91 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11 - Forks: 1
wzongyu/LLM-and-Multimodal-Paper-List
A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.
Size: 103 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 25 - Forks: 2
kyegomez/MLXTransformer
Simple Implementation of a Transformer in the new framework MLX by Apple
Language: Python - Size: 2.18 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 14 - Forks: 0
salesforce/UniControl
Unified Controllable Visual Generation Model
Language: Python - Size: 145 MB - Last synced: about 1 month ago - Pushed: 6 months ago - Stars: 574 - Forks: 31
deep-symbolic-mathematics/Multimodal-Math-Pretraining
[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"
Language: Python - Size: 962 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 25 - Forks: 1
BoHuangLab/CELL-E_2
Encoder-only model for image-based protein predictions
Language: Python - Size: 12.9 MB - Last synced: 26 days ago - Pushed: 5 months ago - Stars: 8 - Forks: 0
seungheondoh/audio-lyrics-emotion-recognition 📦
(Unofficial) Pytorch Implementation of Music Mood Detection Based On Audio And Lyrics With Deep Neural Net
Language: Python - Size: 17.7 MB - Last synced: 5 days ago - Pushed: over 4 years ago - Stars: 88 - Forks: 22
PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Language: Python - Size: 18.6 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 474 - Forks: 34
kyegomez/NeVA
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
Language: Python - Size: 253 KB - Last synced: 20 days ago - Pushed: 9 months ago - Stars: 16 - Forks: 1