GitHub topics: multi-modality

Repositories

kyegomez/swarms

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

Language: Python - Size: 104 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 4,945 - Forks: 579

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language: Python - Size: 13.4 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 22,861 - Forks: 2,526

RLHF-V/RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Language: Python - Size: 70.6 MB - Last synced at: 2 days ago - Pushed at: 10 months ago - Stars: 281 - Forks: 8

jina-ai/rungpt

An open-source cloud-native of large multi-modal models (LMMs) serving framework.

Language: Python - Size: 5.29 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 166 - Forks: 22

jina-ai/clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Language: Python - Size: 27.4 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 12,689 - Forks: 2,076

BradyFU/Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Size: 83 MB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 15,585 - Forks: 1,012

EvolvingLMMs-Lab/Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language: Python - Size: 7.39 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 3,256 - Forks: 212

GLUS-video/GLUS

[CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Language: Jupyter Notebook - Size: 66.4 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 43 - Forks: 4

lucidrains/deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

Language: Python - Size: 6.68 MB - Last synced at: about 19 hours ago - Pushed at: over 3 years ago - Stars: 4,359 - Forks: 318

OpenBMB/VisRAG

Parsing-free RAG supported by VLMs

Language: Python - Size: 14.7 MB - Last synced at: 17 days ago - Pushed at: 4 months ago - Stars: 725 - Forks: 57

DLR-RM/3DObjectTracking

Algorithms and Publications on 3D Object Tracking

Language: C++ - Size: 201 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 890 - Forks: 157

StaRainJ/MINIMA Fork of LSXI7/MINIMA

[CVPR 2025] MINIMA: Modality Invariant Image Matching

Language: Python - Size: 44.3 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 5 - Forks: 0

kyegomez/SwarmOS

An all-new OS that orchestrates autonomous agents as workers to execute tasks.

Language: Shell - Size: 2.21 MB - Last synced at: 17 days ago - Pushed at: 8 months ago - Stars: 18 - Forks: 2

kyegomez/forest-of-thoughts

A forest of autonomous agents.

Language: Python - Size: 224 KB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 19 - Forks: 1

kyegomez/GATS

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta

Language: Python - Size: 2.17 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 0

kyegomez/qformer

Implementation of Qformer from BLIP2 in Zeta Lego blocks.

Language: Python - Size: 2.19 MB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 39 - Forks: 0

kyegomez/TinyGPTV

Simple Implementation of TinyGPTV in super simple Zeta lego blocks

Language: Python - Size: 2.17 MB - Last synced at: 16 days ago - Pushed at: 8 months ago - Stars: 16 - Forks: 0

kyegomez/HRTX

Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

Language: Python - Size: 2.19 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 16 - Forks: 3

kyegomez/MultiModal-ToT

Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement

Language: Python - Size: 81.2 MB - Last synced at: 17 days ago - Pushed at: 8 months ago - Stars: 17 - Forks: 2

OpenGVLab/Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Language: Python - Size: 21.5 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 529 - Forks: 38

OpenGVLab/LORIS

Long-Term Rhythmic Video Soundtracker, ICML2023

Language: Python - Size: 862 KB - Last synced at: 19 days ago - Pushed at: 12 months ago - Stars: 59 - Forks: 1

dvlab-research/VisionZip

Official repository for VisionZip (CVPR 2025)

Language: Python - Size: 18.2 MB - Last synced at: 22 days ago - Pushed at: 30 days ago - Stars: 284 - Forks: 12

yuanze-lin/Olympus

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

Language: Python - Size: 3.5 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 425 - Forks: 71

LSXI7/MINIMA

[CVPR 2025] MINIMA: Modality Invariant Image Matching

Language: Python - Size: 44.3 MB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 377 - Forks: 26

vbdi/divprune

[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Language: Python - Size: 11 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 18 - Forks: 0

han-liu/awesome-missing-modality-for-medical-images

A comprehensive review of techniques to address the missing-modality problem for medical images

Size: 82 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 54 - Forks: 3

ChenHongruixuan/BRIGHT

[IEEE GRSS DFC 2025 Track II] BRIGHT: A globally distributed multimodal VHR dataset for all-weather disaster response

Language: Python - Size: 197 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 137 - Forks: 18

kyegomez/MoE-Mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

Language: Python - Size: 2.17 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 104 - Forks: 5

InternLM/InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Language: Python - Size: 200 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,834 - Forks: 172

kyegomez/HSSS

Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling"

Language: Python - Size: 2.19 MB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 12 - Forks: 2

kyegomez/MC-ViT

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

Language: Python - Size: 2.17 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 20 - Forks: 1

kyegomez/MambaByte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

Language: Python - Size: 2.16 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 116 - Forks: 6

kyegomez/AoA-torch

Implementation of Attention on Attention in Zeta

Language: Python - Size: 2.19 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

kyegomez/Kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

Language: Python - Size: 231 KB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 72 - Forks: 6

kyegomez/Gemini

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

Language: Python - Size: 653 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 455 - Forks: 59

kyegomez/Sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

Language: Python - Size: 542 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 380 - Forks: 26

kyegomez/the-compiler

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

Language: Python - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 144 - Forks: 16

kyegomez/Andromeda

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

Language: Python - Size: 66 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 146 - Forks: 23

mit-acl/deep_panther

Language: C++ - Size: 174 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 65 - Forks: 10

anondo1969/FedSemiCovidDetector

Repository for the journal article, 'Federated Semi-Supervised Multi-Task Learning to Detect COVID-19 and Lungs Segmentation Marking Using Chest Radiography Images and Raspberry Pi Devices: An Internet of Medical Things Application', Mahbub Ul Alam, Rahim Rahmani. Sensors 21, no. 15: 5025, https://doi.org/10.3390/s21155025.

Language: Python - Size: 4.24 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

anondo1969/FedSepsis

Repository for the journal article, 'FedSepsis: A Federated Multi-Modal Deep Learning-Based Internet of Medical Things Application for Early Detection of Sepsis from Electronic Health Records Using Raspberry Pi and Jetson Nano Devices', Mahbub Ul Alam, Rahim Rahmani. Sensors 23, no. 2: 970, https://doi.org/10.3390/s23020970.

Language: Jupyter Notebook - Size: 14.5 MB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

The-Martyr/CausalMM

[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Language: Python - Size: 7.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 25 - Forks: 2

yangcaoai/CoDA_NeurIPS2023

Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

Language: Jupyter Notebook - Size: 71.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 198 - Forks: 16

trendscenter/fit

Fusion ICA Toolbox (MATLAB)

Language: MATLAB - Size: 17.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 31 - Forks: 6

amazon-science/crossmodal-contrastive-learning

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

Language: Python - Size: 766 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 63 - Forks: 11

kyegomez/VortexFusion

Transformers + Mambas + LSTMS All in One Model

Language: Python - Size: 2.16 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 8 - Forks: 1

sshh12/multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Language: Python - Size: 1.22 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 184 - Forks: 15

Orlando-CS/Awesome-VLA

✨✨latest advancements in VLA models(VIsion Language Action)

Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

skit-ai/SpeechLLM

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

Language: Python - Size: 3.88 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 98 - Forks: 9

ziqihuangg/Collaborative-Diffusion

[CVPR 2023] Collaborative Diffusion

Language: Python - Size: 4.25 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 424 - Forks: 34

dvlab-research/UVTR

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

Language: Python - Size: 621 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 233 - Forks: 17

ZwwWayne/mmMOT

[ICCV2019] Robust Multi-Modality Multi-Object Tracking

Language: Python - Size: 2.53 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 254 - Forks: 23

jonathanjsjsc/Swarm

🦟 Interactive swarm simulation where pointer swarms follow your cursor - WebGL / threejs

Language: HTML - Size: 37.1 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Skyline-9/Shotluck-Holmes

[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding

Language: Python - Size: 26.3 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 11 - Forks: 0

kyegomez/Athena-for-Search

The World's First AI-Enabled Multi-Modality Native Search Engine

Language: TypeScript - Size: 5.58 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 6

kyegomez/MLXTransformer

Simple Implementation of a Transformer in the new framework MLX by Apple

Language: Python - Size: 2.18 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 20 - Forks: 1

voidful/MMLM

Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra

Language: Python - Size: 688 KB - Last synced at: 14 days ago - Pushed at: 7 months ago - Stars: 14 - Forks: 4

chenshuang-zhang/imagenet_d

[CVPR 2024 Highlight] ImageNet-D

Language: Python - Size: 49.3 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 41 - Forks: 5

kyegomez/Fuyu

Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch

Language: Python - Size: 393 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 25 - Forks: 3

aws-samples/multi-modal-examples-for-amazon-sagemaker

A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.

Language: Jupyter Notebook - Size: 35 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 4 - Forks: 2

ChennyDeng/MM-APE

Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis

Size: 135 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 23 - Forks: 0

BubbleWang-wly/EIEA

Explicit-Implicit Entity Alignment Method in Multi-modal Knowledge Graphs

Language: Python - Size: 755 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

rentainhe/TRAR-VQA

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

Language: Python - Size: 927 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 66 - Forks: 18

manyaafonso/ultrasound_denoising_GAN

Demo to use GANs for denoising and synthesising ultrasound images

Language: Jupyter Notebook - Size: 4.58 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 1

kyegomez/MM1

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"

Language: Python - Size: 2.17 MB - Last synced at: 23 days ago - Pushed at: 30 days ago - Stars: 23 - Forks: 1

rsy6318/CorrI2P

[TCSVT] CorrI2P: Deep Image-to-Point Cloud Registration via Dense CorrespondenceThe code of CorrI2P

Language: Python - Size: 1.27 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 68 - Forks: 9

mida-project/prototype-multi-modality-assistant

[IJHCS] An assistant prototype for breast cancer diagnosis prepared with a multimodality strategy. The work was published in the International Journal of Human-Computer Studies.

Language: JavaScript - Size: 4.04 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 1

dvlab-research/Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Language: Python - Size: 17.4 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 110 - Forks: 2

kyegomez/VisionDatasets

Open source scripts to create large scale datasets with rich detail for multi-modal models

Language: Python - Size: 34.9 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 0

nagababumo/Open-Source-Models-with-Hugging-Face

Language: Jupyter Notebook - Size: 17 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

DerrickWang005/CRIS.pytorch

An official PyTorch implementation of the CRIS paper

Language: Python - Size: 23.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 232 - Forks: 35

researchmm/MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Language: Python - Size: 4.18 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 346 - Forks: 23

Oztobuzz/Vista

This is the official repository for Vista dataset - A Vietnamese multimodal dataset contains more than 700,000 samples of conversations and images

Language: Python - Size: 1.79 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 0

SsGood/MMGL

Multi-modal Graph learning for Disease Prediction (IEEE Trans. on Medical imaging, TMI2022)

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 81 - Forks: 13

zjukg/AdaMF-MAT

[Paper][LREC-COLING 2024] Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion

Language: Python - Size: 1.91 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 1

Lee-Gihun/MEDIAR

(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"

Language: Python - Size: 15 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 112 - Forks: 21

Ravi-Teja-konda/TunedLlavaDelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

Language: Python - Size: 43.3 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Messi-Q/Cross-Modality-Bug-Detection

Cross-Modality Mutual Learning for Smart Contract Vulnerability Detection

Language: Python - Size: 54.8 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 2

anondo1969/thremaltimodal-covidetector

Repository for the conference paper 'COVID-19 detection from thermal image and tabular medical data utilizing multi-modal machine learning', Mahbub Ul Alam, Jaakko Hollmén and Rahim Rahmani. IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), 2023, pp. 646-653.

Language: Python - Size: 12.8 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

xufangzhi/MoCA

[Pattern Recognition] The implementation of MoCA

Language: Python - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 1

wwweiwei/Pre-CoFactv2-AAAI-2023

Official Implementation for Pre-CoFactv2 (AAAI-23 DeFactify2.0 Workshop 1st Place)

Language: Python - Size: 10.8 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

jackyjsy/CVPR21Chal-SLR

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Language: Python - Size: 51.9 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 174 - Forks: 45

ecom-research/ComposeAE 📦

Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval

Language: Python - Size: 672 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 49 - Forks: 13

sagahansson/aics-project

Final project for the course LT2318 Artificial Intelligence: Cognitive Systems. The project concerns multimodal hate speech detection in memes.

Language: TeX - Size: 9.6 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Related Keywords

multi-modality 84 artificial-intelligence 21 machine-learning 18 ai 16 multi-modal 16 deep-learning 16 gpt4 12 multimodal 12 vision-language-model 12 ml 11 chatgpt 10 pytorch 8 large-language-models 7 llm 7 transformers 7 gpt-4 6 llava 5 visual-language-learning 5 instruction-tuning 5 open-source 5 transformer 5 attention-mechanism 5 chatbot 4 swarms 4 multimodal-large-language-models 4 llama 4 diffusion-models 4 attention 4 stable-diffusion 4 multi-modal-fusion 4 attention-is-all-you-need 3 llms 3 agora 3 computer-vision 3 gpt4v 3 foundation-models 3 large-vision-language-model 3 pytorch-implementation 3 internet-of-medical-things 3 neural-networks 2 clinical-decision-support-system 2 covid-19-detection 2 electronic-health-records 2 federated-learning 2 health-informatics 2 multimodal-deep-learning 2 gpt3 2 image-matching 2 dataset 2 tensorflow 2 ssms 2 ensemble 2 vision-transformer 2 multi-modality-data 2 synthetic-data 2 large-language-model 2 aigc 2 language-model 2 visual-question-answering 2 knowledge-graph 2 contrastive-learning 2 huggingface 2 video-captioning 2 multi-modal-imaging 2 prompt-engineering 2 prompt-toolkit 2 tree-of-thoughts 2 llama2 2 3d-detection 2 nlp 2 natural-language-processing 2 large-vision-language-models 2 chain-of-thought 2 raspberry-pi 2 smart-healthcare 2 iomt 2 video-text-retrieval 1 ai-research 1 lstms 1 robustness 1 mybatis 1 mambas 1 recognition 1 pso 1 large-context 1 out-of-distribution 1 imagenet 1 large-multimodal-models 1 conversational-ai 1 text-to-image-synthesis 1 gpt5 1 java 1 internvl2 1 qwen2-vl 1 sagemaker 1 sagemaker-example 1 video 1 sagemaker-studio 1 video-llava 1 vllm 1