GitHub topics: video-understanding

Repositories

GLUS-video/GLUS

[CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Language: Jupyter Notebook - Size: 66.4 MB - Last synced at: about 13 hours ago - Pushed at: 1 day ago - Stars: 31 - Forks: 2

Zhuo-Cao/FlashVTG

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)

Language: Python - Size: 2.68 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 18 - Forks: 2

pritamqu/RRPO

Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

Language: Python - Size: 4.35 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

SoccerNet/sn-gamestate

[CVPRW'24] SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap (CVPR24 - CVSports workshop)

Language: Python - Size: 95.8 MB - Last synced at: about 4 hours ago - Pushed at: about 17 hours ago - Stars: 286 - Forks: 61

The-Martyr/Awesome-Multimodal-Reasoning

Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models

Size: 60.5 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 19 - Forks: 0

E-sar456/GEN-AI-CAPSTONE-PROJECT

This project demonstrates a Generative AI-powered assistant that streamlines the job application process using Google Gemini Pro. It analyzes a user’s resume against a job description, calculates a match score, suggests tailored bullet points, and generates a personalized cover letter — all formatted in structured JSON for automation.

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

cuixing158/Awesome-CV-MasterHub

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 214 - Forks: 12

PKU-YuanGroup/Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language: Python - Size: 38.2 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 931 - Forks: 46

jqtangust/hawk

🔥 🔥 🔥 [NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies

Language: Python - Size: 5.35 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 193 - Forks: 2

yjxiong/temporal-segment-networks

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Language: Python - Size: 2.01 MB - Last synced at: about 4 hours ago - Pushed at: over 4 years ago - Stars: 1,554 - Forks: 475

open-mmlab/mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Language: Python - Size: 68.2 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 4,547 - Forks: 1,268

PaddlePaddle/PaddleVideo

Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video tagging and sport action detection.

Language: Python - Size: 106 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 1,608 - Forks: 383

aws-samples/swift-chat

A lightning-fast, cross-platform AI chat application built with React Native.

Language: TypeScript - Size: 6.47 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 365 - Forks: 38

nirmit27/genai-capstone-project-2025

This repository is dedicated to the development and exploration of generative AI technologies as a part of the capstone project for the 5-day Gen AI Intensive Course with Google.

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

aiden200/Aha-

A Model for Human-Like Reflection in Video Understanding

Language: Python - Size: 19.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

jinwchoi/awesome-action-recognition

A curated list of action recognition and related area resources

Size: 270 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 3,880 - Forks: 726

rese1f/aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Language: Python - Size: 22 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 89 - Forks: 4

pipixin321/Awesome-Video-MLLMs

:fire: :fire: :fire: Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding :video_camera:

Size: 6.84 KB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 14 - Forks: 1

OpenGVLab/Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language: Python - Size: 20.7 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 3,213 - Forks: 261

aistairc/VDAct

A Video-grounded Dialogue Dataset and Metric for Event-driven Activities

Language: Python - Size: 18.8 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 2 - Forks: 0

Vision-CAIR/MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Language: Python - Size: 38.7 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 609 - Forks: 67

kiyoon/channel_sampling

Official implementation of "Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition", BMVC 2022

Language: Python - Size: 1.25 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

katha-ai/VELOCITI

VELOCITI Benchmark Evaluation and Visualisation Code

Language: Python - Size: 184 KB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 0

mit-han-lab/temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

Language: Python - Size: 245 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 2,104 - Forks: 420

yjxiong/action-detection

temporal action detection with SSN

Language: Python - Size: 7.78 MB - Last synced at: about 4 hours ago - Pushed at: almost 6 years ago - Stars: 644 - Forks: 177

v-iashin/SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Language: Jupyter Notebook - Size: 163 MB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 360 - Forks: 39

yjxiong/tsn-pytorch

Temporal Segment Networks (TSN) in PyTorch

Language: Python - Size: 30.3 KB - Last synced at: about 4 hours ago - Pushed at: almost 6 years ago - Stars: 1,070 - Forks: 310

movienet/movienet-tools

Tools for movie and video research

Language: C++ - Size: 6.56 MB - Last synced at: 8 days ago - Pushed at: almost 3 years ago - Stars: 289 - Forks: 34

henghuiding/MeViS

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

Language: Python - Size: 52.2 MB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 521 - Forks: 22

ParitoshParmar/MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

Language: Python - Size: 27.7 MB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 67 - Forks: 15

OpenGVLab/InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language: Python - Size: 53.3 MB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1,771 - Forks: 106

dvlab-research/LSDBench

A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs.

Language: Python - Size: 2.57 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 3 - Forks: 0

TencentARC/ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Language: Python - Size: 19 MB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 142 - Forks: 5

zhaoyue-zephyrus/AVION

[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"

Language: Python - Size: 1.33 MB - Last synced at: 15 days ago - Pushed at: 9 months ago - Stars: 127 - Forks: 9

DAMO-NLP-SG/VideoRefer

[CVPR 2025] The code for "VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM"

Language: Python - Size: 130 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 176 - Forks: 9

IntelLabs/GraVi-T

Graph learning framework for long-term video understanding

Language: Python - Size: 529 KB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 60 - Forks: 9

ZJCV/X3D

[CVPR 2020] X3D: Expanding Architectures for Efficient Video Recognition

Language: Python - Size: 146 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 4

fepegar/gestures-miccai-2021 📦

Code for "Pérez-García et al. 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures, MICCAI 2021".

Language: Python - Size: 788 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 4

open-mmlab/mmaction

An open-source toolbox for action understanding based on PyTorch

Language: Python - Size: 3.95 MB - Last synced at: 11 days ago - Pushed at: about 3 years ago - Stars: 1,870 - Forks: 350

ParitoshParmar/C3D-LSTM--PyTorch

C3D-LSTM implementation in PyTorch [WACV 2019]

Language: Python - Size: 91.8 KB - Last synced at: 17 days ago - Pushed at: almost 5 years ago - Stars: 39 - Forks: 10

chihyaoma/Activity-Recognition-with-CNN-and-RNN

Temporal Segments LSTM and Temporal-Inception for Activity Recognition

Language: Lua - Size: 118 MB - Last synced at: 10 days ago - Pushed at: almost 5 years ago - Stars: 442 - Forks: 147

JunweiLiang/Multiverse

Dataset, code and model for the CVPR'20 paper "The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction". And for the ECCV'20 SimAug paper.

Language: Python - Size: 82.1 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 257 - Forks: 64

OpenGVLab/VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Language: Python - Size: 935 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 594 - Forks: 68

MCG-NJU/VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Language: Python - Size: 547 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,451 - Forks: 142

vt-vl-lab/SDN

[NeurIPS 2019] Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Language: Python - Size: 40 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 83 - Forks: 13

ZijiaLewisLu/CVPR2024-FACT

Official Repo for CVPR 2024 Paper "FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Fully-Supervised Action Segmentation"

Language: Python - Size: 401 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 59 - Forks: 6

sming256/TimeLoc

TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos

Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

amazon-science/video-contrastive-learning

Video Contrastive Learning with Global Context, ICCVW 2021

Language: Python - Size: 198 KB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 158 - Forks: 16

Leon1207/Video-RAG-master

This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"

Language: Python - Size: 436 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 116 - Forks: 12

sitamgithub-MSIT/videollama3-litserve

Leverage VideoLLaMA 3's capabilities using LitServe.

Language: Python - Size: 5.12 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

showlab/Awesome-Long-Context

A curated list of resources about long-context in large-language models and video understanding.

Size: 8.79 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 30 - Forks: 2

declare-lab/Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Language: Python - Size: 8.92 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 11 - Forks: 3

kiyoon/verb_ambiguity

Official implementation of "An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition", BMVC 2022

Language: Python - Size: 1.07 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

v-iashin/Synchformer

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Language: Python - Size: 92.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 42 - Forks: 5

Huangmr0719/Easy-Video-Feature-Extraction

A Simple Video Visual Feature Extraction with CLIP Implementation in PyTorch

Language: Python - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cmhungsteve/SSTDA

[CVPR 2020] Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation (PyTorch)

Language: Python - Size: 1.15 MB - Last synced at: 16 days ago - Pushed at: about 4 years ago - Stars: 156 - Forks: 23

westlake-repl/MicroLens

A Large Short-video Recommendation Dataset with Raw Text/Audio/Image/Videos (Talk Invited by DeepMind).

Language: Python - Size: 62.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 154 - Forks: 10

sarthak268/nesca-pytorch

PyTorch Implementation for the paper "Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation" accepted to RA-L'24.

Language: Python - Size: 405 KB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 10 - Forks: 1

Video-Bench/Video-Bench

Video Generation Benchmark

Language: Python - Size: 8.92 MB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 7 - Forks: 2

The-Martyr/OccludeNet-Dataset

OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition amidst Occlusions

Size: 7.34 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

engindeniz/vitis

[ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts

Language: Python - Size: 270 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

google/graph_distillation

Graph Distillation for Action Detection

Language: Python - Size: 192 KB - Last synced at: 10 days ago - Pushed at: almost 6 years ago - Stars: 66 - Forks: 19

YangLiu9208/SAKDN

[IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition

Language: Python - Size: 86.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 23 - Forks: 3

agentic-learning-ai-lab/lifelong-memory

Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

Language: Python - Size: 79.5 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 17 - Forks: 0

sming256/OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Language: Python - Size: 352 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 199 - Forks: 14

DoranLyong/Awesome-Video-Action-Recognition-Paper-Review-and-Practice

Video Action Recognition Study

Size: 50.8 KB - Last synced at: about 23 hours ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

CuriousDima/mirk

Mirk seamlessly integrates classical CV models with large visual models, enabling richly contextualized and detailed video analysis and understanding.

Language: Python - Size: 6.45 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Alirezarahnamaa/Adaptive-Frame-Selection-Algorithm

We completed Adaptive frame selection to find informative frames for action recognition and avoid redundancy.

Language: Python - Size: 3.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

hustvl/TeViT

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Language: Python - Size: 56.5 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 239 - Forks: 18

lucidrains/LVMAE-pytorch

Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch

Language: Python - Size: 740 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 45 - Forks: 1

SCZwangxiao/TSGVs-MM2023

ACM Multimedia 2023 - Temporal Sentence in Streaming Videos

Language: Python - Size: 195 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 7 - Forks: 0

Alirezarahnamaa/Feature_Extraction

In this repository, we have the feature extraction code, This code uses ResNet50 to extract spatial features from video frames.

Language: Python - Size: 1.46 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

whwu95/Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Language: Python - Size: 8.58 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 240 - Forks: 20

MCG-NJU/AWT

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

Language: Python - Size: 12.3 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 79 - Forks: 1

xyzforever/BEVT

PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529

Language: Python - Size: 19.2 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 158 - Forks: 19

mu-cai/TemporalBench

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Language: Python - Size: 9.77 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 11 - Forks: 1

unitaryai/VTC

VTC: Improving Video-Text Retrieval with User Comments

Language: Python - Size: 5.45 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 11 - Forks: 0

alibaba-mmai-research/TAdaConv

[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.

Language: Python - Size: 1.64 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 226 - Forks: 31

chinancheng/awesome-activity-prediction

Paper list of activity prediction and related area

Size: 30.3 KB - Last synced at: about 12 hours ago - Pushed at: about 5 years ago - Stars: 167 - Forks: 34

aav-antonov/TSAM

Video recognition: Temporal Shift Module With Audio Modality

Language: Python - Size: 12.9 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 1

ZJCV/TSM

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

Language: Python - Size: 174 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 21 - Forks: 2

eric-ai-lab/MMWorld

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Language: Python - Size: 1.47 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 18 - Forks: 1

davidhaas6/digest

Streamlined video understanding with the help of language models

Language: Python - Size: 52.7 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

whwu95/BIKE

【CVPR'2023】Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

Language: Python - Size: 9.01 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 155 - Forks: 18

twelvelabs-io/video-embeddings-evaluation-framework

Pytorch implementation of Twelve Labs' Video Foundation Model evaluation framework & open embeddings

Language: Python - Size: 21.2 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 0

sakibreza/ECCV24-HAT

Official repository of ECCV 2024 paper - "HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization"

Language: Python - Size: 972 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 5 - Forks: 0

wjun0830/CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Language: Python - Size: 23.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 105 - Forks: 11

HKUST-LongGroup/Awesome-Open-Vocabulary-Detection-and-Segmentation

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

Size: 1.06 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 81 - Forks: 5

McJackTang/MMPD_rPPG_dataset

MMPD: Multi-Domain Mobile Video Physiology Dataset(EMBC2023 Oral)

Language: Python - Size: 301 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 104 - Forks: 13

rohitgirdhar/ActionVLAD

ActionVLAD for video action classification (CVPR 2017)

Language: Python - Size: 13 MB - Last synced at: 5 months ago - Pushed at: about 6 years ago - Stars: 216 - Forks: 61

doc-doc/NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Language: Python - Size: 6.67 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 114 - Forks: 11

pha123661/SA-DVAE

[ECCV 2024] The official repo for "SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders"

Language: Python - Size: 738 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

boheumd/MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Language: Python - Size: 29.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 188 - Forks: 24

ZJCV/TSN

[ECCV 2016] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Language: Python - Size: 519 KB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 8 - Forks: 3

jinxiang-liu/UFE-AVS

Official code for CVPR 2024 paper, "Audio-Visual Segmentation via Unlabeled Frame Exploitation""

Language: Python - Size: 21.9 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 8 - Forks: 0

ddz16/TSASPC

[2023 IJCAI] The PyTorch implementation of the paper "Timestamp-Supervised Action Segmentation from the Perspective of Clustering".

Language: Python - Size: 219 KB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

LTContext/LTContext

[ICCV 2023] How Much Temporal Long-Term Context is Needed for Action Segmentation?

Language: Python - Size: 334 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 37 - Forks: 3

thswodnjs3/CSTA

The official code of "CSTA: CNN-based Spatiotemporal Attention for Video Summarization"

Language: Python - Size: 322 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 16 - Forks: 0

sangminwoo/Temporal-Span-Proposal-Network-VidVRD

What and When to look?: Temporal Span Proposal Network for Video Relation Detection

Language: Python - Size: 4.87 MB - Last synced at: 10 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 5

whwu95/FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant

Language: Python - Size: 3.22 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 34 - Forks: 0

Related Keywords

video-understanding 206 action-recognition 61 computer-vision 36 pytorch 35 deep-learning 29 action-detection 18 temporal-action-localization 13 video-recognition 12 self-supervised-learning 12 video-processing 11 video 11 large-language-models 11 temporal-action-detection 11 video-question-answering 10 transfer-learning 9 dataset 9 multimodal-learning 9 video-classification 8 activity-recognition 8 llm 8 masked-autoencoder 6 vision-language 6 awesome-list 6 vision-and-language 6 c3d 6 action-quality-assessment 6 multimodal-large-language-models 6 action-anticipation 5 foundation-models 5 artificial-intelligence 5 representation-learning 5 machine-learning 5 biometrics 5 hand-gesture-authentication 5 benchmark 5 vision-language-model 5 video-analysis 5 transformer 5 tensorflow 4 clip 4 zero-shot-learning 4 video-text-retrieval 4 convolutional-neural-networks 4 object-detection 4 visual-question-answering 4 long-video-understanding 4 natural-language-processing 4 vision-transformer 4 videoqa 4 lstm 4 video-dataset 4 transformers 4 video-representation-learning 4 cnn 4 action-classification 4 human-computer-interaction 4 video-generation 4 temporal-activity-localization 4 temporal-modeling 4 contrastive-learning 4 weakly-supervised-learning 4 multimodal 4 tsm 4 non-local 3 tsn 3 charades 3 action-localization 3 python 3 language-grounding 3 ava 3 temporal-segment-networks 3 slowfast 3 cross-modal-learning 3 multimodal-deep-learning 3 attention-mechanism 3 cvpr 3 image-generation 3 multi-modal 3 action-segmentation 3 cvpr2024 3 mllm 3 video-grounding 3 deep-neural-networks 3 video-summarization 3 neurips-2022 3 temporal-language-grounding 3 pre-training 3 cvpr2023 3 anomaly-detection 3 awesome 3 domain-adaptation 3 pose-estimation 3 c3d-lstm 3 video-retrieval 3 fine-grained-classification 2 scene-graph-generation 2 cross-modality 2 activity-detection 2 activity-understanding 2 something-something 2