Topic: "cross-modal-pretraining"
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Language: Python - Size: 19.6 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 3,006 - Forks: 272

JacobYuan7/RLIP
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Graph Generation.
Language: Python - Size: 15.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 67 - Forks: 3
