An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multimodal-alignment

codefuse-ai/GALLa

[ACL 2025] Graph Aligned Large Language Models for Improved Source Code Understanding

Language: Python - Size: 43 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 0

AstraZeneca/vlm

Official implementation for "Diffusion Instruction Tuning"

Language: Python - Size: 25.5 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 23 - Forks: 2

Eurus-Holmes/MulT

[Reproduce] Code for the ACL2019 paper "Multimodal Transformer for Unaligned Multimodal Language Sequences".

Language: Python - Size: 21 MB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 26 - Forks: 5

GradientSpaces/CrossOver

[CVPR 2025, Highlight] CrossOver: 3D Scene Cross-Modal Alignment

Language: Python - Size: 203 KB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 81 - Forks: 6

Jiamian-Wang/DITS-text-video-retrieval

Official implementation of "Diffusion-Inspired Truncated Sampler for Text-Video Retrieval (NeurIPS 2024)"

Size: 2.93 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

vijay-jaisankar/multimodal-alignment

Multimodal alignment of images and point clouds on the Modelnet-40-C dataset

Language: TeX - Size: 2.33 GB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

marcomoldovan/multimodal-self-distillation

A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.

Language: Python - Size: 526 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 2

marcomoldovan/3d-attention-video-understanding

Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.

Language: Python - Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0