GitHub topics: video-understanding
GLUS-video/GLUS
[CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Language: Jupyter Notebook - Size: 66.4 MB - Last synced at: about 13 hours ago - Pushed at: 1 day ago - Stars: 31 - Forks: 2

Zhuo-Cao/FlashVTG
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)
Language: Python - Size: 2.68 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 18 - Forks: 2

pritamqu/RRPO
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Language: Python - Size: 4.35 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

SoccerNet/sn-gamestate
[CVPRW'24] SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap (CVPR24 - CVSports workshop)
Language: Python - Size: 95.8 MB - Last synced at: about 4 hours ago - Pushed at: about 17 hours ago - Stars: 286 - Forks: 61

The-Martyr/Awesome-Multimodal-Reasoning
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
Size: 60.5 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 19 - Forks: 0

E-sar456/GEN-AI-CAPSTONE-PROJECT
This project demonstrates a Generative AI-powered assistant that streamlines the job application process using Google Gemini Pro. It analyzes a user’s resume against a job description, calculates a match score, suggests tailored bullet points, and generates a personalized cover letter — all formatted in structured JSON for automation.
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

cuixing158/Awesome-CV-MasterHub
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 214 - Forks: 12

PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Language: Python - Size: 38.2 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 931 - Forks: 46

jqtangust/hawk
🔥 🔥 🔥 [NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies
Language: Python - Size: 5.35 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 193 - Forks: 2

yjxiong/temporal-segment-networks
Code & Models for Temporal Segment Networks (TSN) in ECCV 2016
Language: Python - Size: 2.01 MB - Last synced at: about 4 hours ago - Pushed at: over 4 years ago - Stars: 1,554 - Forks: 475

open-mmlab/mmaction2
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Language: Python - Size: 68.2 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 4,547 - Forks: 1,268

PaddlePaddle/PaddleVideo
Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video tagging and sport action detection.
Language: Python - Size: 106 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 1,608 - Forks: 383

aws-samples/swift-chat
A lightning-fast, cross-platform AI chat application built with React Native.
Language: TypeScript - Size: 6.47 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 365 - Forks: 38

nirmit27/genai-capstone-project-2025
This repository is dedicated to the development and exploration of generative AI technologies as a part of the capstone project for the 5-day Gen AI Intensive Course with Google.
Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

aiden200/Aha-
A Model for Human-Like Reflection in Video Understanding
Language: Python - Size: 19.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

jinwchoi/awesome-action-recognition
A curated list of action recognition and related area resources
Size: 270 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 3,880 - Forks: 726

rese1f/aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Language: Python - Size: 22 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 89 - Forks: 4

pipixin321/Awesome-Video-MLLMs
:fire: :fire: :fire: Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding :video_camera:
Size: 6.84 KB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 14 - Forks: 1

OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language: Python - Size: 20.7 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 3,213 - Forks: 261

aistairc/VDAct
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Language: Python - Size: 18.8 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 2 - Forks: 0

Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Language: Python - Size: 38.7 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 609 - Forks: 67

kiyoon/channel_sampling
Official implementation of "Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition", BMVC 2022
Language: Python - Size: 1.25 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

katha-ai/VELOCITI
VELOCITI Benchmark Evaluation and Visualisation Code
Language: Python - Size: 184 KB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 0

mit-han-lab/temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Language: Python - Size: 245 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 2,104 - Forks: 420

yjxiong/action-detection
temporal action detection with SSN
Language: Python - Size: 7.78 MB - Last synced at: about 4 hours ago - Pushed at: almost 6 years ago - Stars: 644 - Forks: 177

v-iashin/SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Language: Jupyter Notebook - Size: 163 MB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 360 - Forks: 39

yjxiong/tsn-pytorch
Temporal Segment Networks (TSN) in PyTorch
Language: Python - Size: 30.3 KB - Last synced at: about 4 hours ago - Pushed at: almost 6 years ago - Stars: 1,070 - Forks: 310

movienet/movienet-tools
Tools for movie and video research
Language: C++ - Size: 6.56 MB - Last synced at: 8 days ago - Pushed at: almost 3 years ago - Stars: 289 - Forks: 34

henghuiding/MeViS
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Language: Python - Size: 52.2 MB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 521 - Forks: 22

ParitoshParmar/MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Language: Python - Size: 27.7 MB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 67 - Forks: 15

OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language: Python - Size: 53.3 MB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1,771 - Forks: 106

dvlab-research/LSDBench
A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs.
Language: Python - Size: 2.57 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 3 - Forks: 0

TencentARC/ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Language: Python - Size: 19 MB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 142 - Forks: 5

zhaoyue-zephyrus/AVION
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
Language: Python - Size: 1.33 MB - Last synced at: 15 days ago - Pushed at: 9 months ago - Stars: 127 - Forks: 9

DAMO-NLP-SG/VideoRefer
[CVPR 2025] The code for "VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM"
Language: Python - Size: 130 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 176 - Forks: 9

IntelLabs/GraVi-T
Graph learning framework for long-term video understanding
Language: Python - Size: 529 KB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 60 - Forks: 9

ZJCV/X3D
[CVPR 2020] X3D: Expanding Architectures for Efficient Video Recognition
Language: Python - Size: 146 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 4

fepegar/gestures-miccai-2021 📦
Code for "Pérez-García et al. 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures, MICCAI 2021".
Language: Python - Size: 788 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 4

open-mmlab/mmaction
An open-source toolbox for action understanding based on PyTorch
Language: Python - Size: 3.95 MB - Last synced at: 11 days ago - Pushed at: about 3 years ago - Stars: 1,870 - Forks: 350

ParitoshParmar/C3D-LSTM--PyTorch
C3D-LSTM implementation in PyTorch [WACV 2019]
Language: Python - Size: 91.8 KB - Last synced at: 17 days ago - Pushed at: almost 5 years ago - Stars: 39 - Forks: 10

chihyaoma/Activity-Recognition-with-CNN-and-RNN
Temporal Segments LSTM and Temporal-Inception for Activity Recognition
Language: Lua - Size: 118 MB - Last synced at: 10 days ago - Pushed at: almost 5 years ago - Stars: 442 - Forks: 147

JunweiLiang/Multiverse
Dataset, code and model for the CVPR'20 paper "The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction". And for the ECCV'20 SimAug paper.
Language: Python - Size: 82.1 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 257 - Forks: 64

OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Language: Python - Size: 935 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 594 - Forks: 68

MCG-NJU/VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Language: Python - Size: 547 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,451 - Forks: 142

vt-vl-lab/SDN
[NeurIPS 2019] Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition
Language: Python - Size: 40 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 83 - Forks: 13

ZijiaLewisLu/CVPR2024-FACT
Official Repo for CVPR 2024 Paper "FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Fully-Supervised Action Segmentation"
Language: Python - Size: 401 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 59 - Forks: 6

sming256/TimeLoc
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

amazon-science/video-contrastive-learning
Video Contrastive Learning with Global Context, ICCVW 2021
Language: Python - Size: 198 KB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 158 - Forks: 16

Leon1207/Video-RAG-master
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
Language: Python - Size: 436 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 116 - Forks: 12

sitamgithub-MSIT/videollama3-litserve
Leverage VideoLLaMA 3's capabilities using LitServe.
Language: Python - Size: 5.12 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

showlab/Awesome-Long-Context
A curated list of resources about long-context in large-language models and video understanding.
Size: 8.79 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 30 - Forks: 2

declare-lab/Sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
Language: Python - Size: 8.92 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 11 - Forks: 3

kiyoon/verb_ambiguity
Official implementation of "An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition", BMVC 2022
Language: Python - Size: 1.07 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

v-iashin/Synchformer
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
Language: Python - Size: 92.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 42 - Forks: 5

Huangmr0719/Easy-Video-Feature-Extraction
A Simple Video Visual Feature Extraction with CLIP Implementation in PyTorch
Language: Python - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cmhungsteve/SSTDA
[CVPR 2020] Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation (PyTorch)
Language: Python - Size: 1.15 MB - Last synced at: 16 days ago - Pushed at: about 4 years ago - Stars: 156 - Forks: 23

westlake-repl/MicroLens
A Large Short-video Recommendation Dataset with Raw Text/Audio/Image/Videos (Talk Invited by DeepMind).
Language: Python - Size: 62.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 154 - Forks: 10

sarthak268/nesca-pytorch
PyTorch Implementation for the paper "Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation" accepted to RA-L'24.
Language: Python - Size: 405 KB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 10 - Forks: 1

Video-Bench/Video-Bench
Video Generation Benchmark
Language: Python - Size: 8.92 MB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 7 - Forks: 2

The-Martyr/OccludeNet-Dataset
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition amidst Occlusions
Size: 7.34 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

engindeniz/vitis
[ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Language: Python - Size: 270 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

google/graph_distillation
Graph Distillation for Action Detection
Language: Python - Size: 192 KB - Last synced at: 10 days ago - Pushed at: almost 6 years ago - Stars: 66 - Forks: 19

YangLiu9208/SAKDN
[IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition
Language: Python - Size: 86.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 23 - Forks: 3

agentic-learning-ai-lab/lifelong-memory
Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Language: Python - Size: 79.5 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 17 - Forks: 0

sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Language: Python - Size: 352 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 199 - Forks: 14

DoranLyong/Awesome-Video-Action-Recognition-Paper-Review-and-Practice
Video Action Recognition Study
Size: 50.8 KB - Last synced at: about 23 hours ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

CuriousDima/mirk
Mirk seamlessly integrates classical CV models with large visual models, enabling richly contextualized and detailed video analysis and understanding.
Language: Python - Size: 6.45 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Alirezarahnamaa/Adaptive-Frame-Selection-Algorithm
We completed Adaptive frame selection to find informative frames for action recognition and avoid redundancy.
Language: Python - Size: 3.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

hustvl/TeViT
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral
Language: Python - Size: 56.5 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 239 - Forks: 18

lucidrains/LVMAE-pytorch
Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch
Language: Python - Size: 740 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 45 - Forks: 1

SCZwangxiao/TSGVs-MM2023
ACM Multimedia 2023 - Temporal Sentence in Streaming Videos
Language: Python - Size: 195 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 7 - Forks: 0

Alirezarahnamaa/Feature_Extraction
In this repository, we have the feature extraction code, This code uses ResNet50 to extract spatial features from video frames.
Language: Python - Size: 1.46 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

whwu95/Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Language: Python - Size: 8.58 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 240 - Forks: 20

MCG-NJU/AWT
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Language: Python - Size: 12.3 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 79 - Forks: 1

xyzforever/BEVT
PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
Language: Python - Size: 19.2 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 158 - Forks: 19

mu-cai/TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Language: Python - Size: 9.77 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 11 - Forks: 1

unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language: Python - Size: 5.45 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 11 - Forks: 0

alibaba-mmai-research/TAdaConv
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
Language: Python - Size: 1.64 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 226 - Forks: 31

chinancheng/awesome-activity-prediction
Paper list of activity prediction and related area
Size: 30.3 KB - Last synced at: about 12 hours ago - Pushed at: about 5 years ago - Stars: 167 - Forks: 34

aav-antonov/TSAM
Video recognition: Temporal Shift Module With Audio Modality
Language: Python - Size: 12.9 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 1

ZJCV/TSM
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Language: Python - Size: 174 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 21 - Forks: 2

eric-ai-lab/MMWorld
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
Language: Python - Size: 1.47 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 18 - Forks: 1

davidhaas6/digest
Streamlined video understanding with the help of language models
Language: Python - Size: 52.7 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

whwu95/BIKE
【CVPR'2023】Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Language: Python - Size: 9.01 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 155 - Forks: 18

twelvelabs-io/video-embeddings-evaluation-framework
Pytorch implementation of Twelve Labs' Video Foundation Model evaluation framework & open embeddings
Language: Python - Size: 21.2 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 0

sakibreza/ECCV24-HAT
Official repository of ECCV 2024 paper - "HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization"
Language: Python - Size: 972 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 5 - Forks: 0

wjun0830/CGDETR
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Language: Python - Size: 23.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 105 - Forks: 11

HKUST-LongGroup/Awesome-Open-Vocabulary-Detection-and-Segmentation
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Size: 1.06 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 81 - Forks: 5

McJackTang/MMPD_rPPG_dataset
MMPD: Multi-Domain Mobile Video Physiology Dataset(EMBC2023 Oral)
Language: Python - Size: 301 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 104 - Forks: 13

rohitgirdhar/ActionVLAD
ActionVLAD for video action classification (CVPR 2017)
Language: Python - Size: 13 MB - Last synced at: 5 months ago - Pushed at: about 6 years ago - Stars: 216 - Forks: 61

doc-doc/NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Language: Python - Size: 6.67 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 114 - Forks: 11

pha123661/SA-DVAE
[ECCV 2024] The official repo for "SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders"
Language: Python - Size: 738 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

boheumd/MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Language: Python - Size: 29.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 188 - Forks: 24

ZJCV/TSN
[ECCV 2016] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Language: Python - Size: 519 KB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 8 - Forks: 3

jinxiang-liu/UFE-AVS
Official code for CVPR 2024 paper, "Audio-Visual Segmentation via Unlabeled Frame Exploitation""
Language: Python - Size: 21.9 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 8 - Forks: 0

ddz16/TSASPC
[2023 IJCAI] The PyTorch implementation of the paper "Timestamp-Supervised Action Segmentation from the Perspective of Clustering".
Language: Python - Size: 219 KB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

LTContext/LTContext
[ICCV 2023] How Much Temporal Long-Term Context is Needed for Action Segmentation?
Language: Python - Size: 334 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 37 - Forks: 3

thswodnjs3/CSTA
The official code of "CSTA: CNN-based Spatiotemporal Attention for Video Summarization"
Language: Python - Size: 322 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 16 - Forks: 0

sangminwoo/Temporal-Span-Proposal-Network-VidVRD
What and When to look?: Temporal Span Proposal Network for Video Relation Detection
Language: Python - Size: 4.87 MB - Last synced at: 10 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 5

whwu95/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
Language: Python - Size: 3.22 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 34 - Forks: 0
