Topic: "captioning"
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Language: Python - Size: 17.4 MB - Last synced at: 6 days ago - Pushed at: 20 days ago - Stars: 5,558 - Forks: 939

roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Language: Python - Size: 10.6 MB - Last synced at: about 3 hours ago - Pushed at: 6 days ago - Stars: 2,551 - Forks: 203

ltguo19/VSUA-Captioning
Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019
Language: Python - Size: 205 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 264 - Forks: 24

DavidHuji/CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Language: Python - Size: 35.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 158 - Forks: 17

fpgaminer/joycaption
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
Language: Python - Size: 290 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 153 - Forks: 2

Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
Language: Python - Size: 2.68 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 115 - Forks: 6

drethage/fully-convolutional-point-network
Fully-Convolutional Point Networks for Large-Scale Point Clouds
Language: Python - Size: 1.53 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 87 - Forks: 22

audio-captioning/clotho-dataset
Python code for handling the Clotho dataset.
Language: Python - Size: 99.6 KB - Last synced at: 9 months ago - Pushed at: over 4 years ago - Stars: 74 - Forks: 15

wangleihitcs/MedicalReportGeneration
A Base Tensorflow Project for Medical Report Generation
Language: Python - Size: 69.7 MB - Last synced at: 21 days ago - Pushed at: almost 6 years ago - Stars: 71 - Forks: 18

ParitoshParmar/MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Language: Python - Size: 27.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 68 - Forks: 15

mitvis/vistext
VisText is a benchmark dataset for semantically rich chart captioning.
Language: Jupyter Notebook - Size: 2.77 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 66 - Forks: 3

aimagelab/pacscore
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. (CVPR 2023)
Language: Python - Size: 7.15 MB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 61 - Forks: 5

TheShadow29/VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Language: Python - Size: 928 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 50 - Forks: 7

Labbeti/aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
Language: Python - Size: 856 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 45 - Forks: 3

DavidMChan/caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
Language: Python - Size: 7.43 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 4

lucidrains/AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Language: Python - Size: 39.1 KB - Last synced at: 17 days ago - Pushed at: over 4 years ago - Stars: 41 - Forks: 5

deepgram-devs/video-chat
Sample app to display live captioning to a WebRTC video session with the Deepgram API.
Language: JavaScript - Size: 392 KB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 37 - Forks: 14

audio-captioning/dcase-2020-baseline
Audio captioning baseline system for DCASE 2020 challenge.
Language: Python - Size: 92.5 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 36 - Forks: 11

HaydenFaulkner/Tennis
A Tennis dataset and models for event detection & commentary generation
Language: Python - Size: 30.3 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 35 - Forks: 11

aimagelab/camel
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
Language: Python - Size: 8.46 MB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 12

CurryYuan/X-Trans2Cap
[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Language: Python - Size: 64 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 3

RyanLiut/awesome-diverse-captioning
Some papers about *diverse* image (a few videos) captioning
Size: 124 KB - Last synced at: about 16 hours ago - Pushed at: about 2 years ago - Stars: 26 - Forks: 3

ebu/ebu-tt-live-toolkit
Toolkit for supporting the EBU-TT Live specification
Language: Python - Size: 112 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 10

elbayadm/PaperNotes
My notes on some Deep Learning papers
Language: HTML - Size: 1.57 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 25 - Forks: 4

alecwangcq/show-attend-and-tell
Language: Jupyter Notebook - Size: 3.06 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 25 - Forks: 11

FeiElysia/awesome-zero-shot-captioning
A curated list of zero-shot captioning papers
Size: 15.6 KB - Last synced at: about 5 hours ago - Pushed at: over 1 year ago - Stars: 22 - Forks: 1

AdrianHsu/S2VT-seq2seq-video-captioning-attention
S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow
Language: Python - Size: 52.6 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 19 - Forks: 10

aimagelab/PMA-Net
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
Language: Python - Size: 5.34 MB - Last synced at: 17 days ago - Pushed at: 11 months ago - Stars: 17 - Forks: 2

hassanhub/R3Transformer
Official python implementation of R3-Transformer
Language: Python - Size: 54.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 1

audio-captioning/caption-evaluation-tools
Tools for the evaluation of audio captioning.
Language: Jupyter Notebook - Size: 98.7 MB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 15 - Forks: 2

rayandrew/indonesian-image-captioning
Indonesian Image Captioning using Attention-based Semantic Compositional Networks
Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: 15 days ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 5

ZhaoPeiduo/BLIP2-Japanese
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
Language: Python - Size: 75.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 1

nssharmaofficial/reddit-hole
Automated reddit scraper and video creator
Language: Python - Size: 384 KB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 12 - Forks: 2

ImKeTT/ZeroGen
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
Language: Python - Size: 2.94 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

2dameneko/ide-cap-chan
ide-cap-chan is a utility for batch image captioning with natural language using various VL models
Language: Python - Size: 1.82 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 11 - Forks: 0

naiveHobo/Smart-I
Smart-I is an android application aimed at helping the visually impaired using artificial intelligence and cloud computing.
Language: Python - Size: 2.05 MB - Last synced at: 28 days ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 0

jamesruan/SimpleSubtitleEditor
SimpleSubtitleEditor for Blender
Language: Python - Size: 20.5 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 10 - Forks: 2

fofr/cog-batch-image-captioning
Caption images for lora training
Language: Python - Size: 17.6 KB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 8 - Forks: 5

nikhilkumarsingh/MemeGenerator
Python program to generate memes.
Language: Jupyter Notebook - Size: 274 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 6

Mauville/MedCLIP
Medical image captioning using OpenAI's CLIP
Language: Jupyter Notebook - Size: 3.11 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 4

oshtz/tagmeister-pc
Efficient image captioning using OpenAI API
Language: TypeScript - Size: 14.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 6 - Forks: 0

ArchAngelAries/TagScribeR
A tool to streamline AI image captioning
Language: Python - Size: 190 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

deepgram-devs/twilio-live-captions
Sample app demonstrating adding live captions to Twilio Video rooms
Language: JavaScript - Size: 23.4 KB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 0

congphase/img-captioning-in-vietnamese
An attempt to solve image captioning (in Vietnamese language) regarding ball sports contexts.
Language: Python - Size: 8.75 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 1

Andrew-Ng-s-number-one-fan/Readings
Size: 322 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

cd2bit/awesome-list-of-captioned-courses
Online professional courses that are captioned and/or subtitled
Size: 10.7 KB - Last synced at: 5 days ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 0

Hyeongkeun/LAVCap
Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)
Language: Python - Size: 3.58 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 0

mrazhou/SEN
Single-stream Extractor Network with Contrastive Pre-training for Remote Sensing Change Captioning
Language: Python - Size: 64.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 1

ebu/ebu-tt
A public repository with key information about the EBU Timed Text (EBU-TT) format.
Size: 7.81 KB - Last synced at: 5 months ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

brayevalerien/ReCap
An image (re)captioning GUI for image generation models dataset preparation, made for easy caption editing.
Language: Python - Size: 2.45 MB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

KennethWussmann/caption.now
Quickly and efficiently caption your image dataset for AI training
Language: TypeScript - Size: 3.76 MB - Last synced at: 19 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

J0SAL/Aide
An App with Voice Assisted Image Captioning and VQA For Visually Challenged Individuals
Language: Dart - Size: 18.2 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

wangleihitcs/ImageCaptions
A base model for image captions.
Language: Python - Size: 96.3 MB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 1

dragonfruit-ai/launchpad
🚀 Documentation and component library for Dragonfruit AI's Launchpad
Size: 91.8 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

Wylgrif/Captioninghelper
a small tool to help caption a dataset | coded in python
Language: Python - Size: 650 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

Aavtic/parashu
A video subtitle editor program in rust.
Language: Rust - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Anshler/ICG_sd_extension
Image caption extension for A1111 Webui 👁️📜🖋️
Language: Python - Size: 181 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

stevennyman/yt-transcript
JavaScript bookmarklet for viewing YouTube video transcripts in a popout window.
Language: JavaScript - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Sharif-SLPL/image-captioning
Automatically describing the content of an image in Persian
Language: Jupyter Notebook - Size: 1.21 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

Dong-JinKim/DRCaptioning
Language: Jupyter Notebook - Size: 8.41 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

elbayadm/captioning
Captioning code in PyTorch
Language: Jupyter Notebook - Size: 763 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

jyotishp/neural-captioning Fork of Saiteja-Reddy/Show-and-Tell
A neural network consisting of CNN and LSTM for generating captions of an image thrown at it.
Language: Jupyter Notebook - Size: 139 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

kozhemyak/joycaption-alpha2-runpod-captioner Fork of brendanmckeag/gemma-captioner-images
This project provides a serverless runpod image captioning service using RunPod and Hugging Face's JoyCaption Alpha Two model. This service processes images/photos and generates descriptive captions or tags based on a customizable prompt.
Language: Python - Size: 76.2 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

trucaption/trucaption
A real-time captioning system with support for large and small screen display.
Language: JavaScript - Size: 2.68 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

AMfeta99/NLP_LLM
This repository is dedicated to small projects and some theoretical material that I used to get into NLP and LLM in a practical and efficient way.
Language: Jupyter Notebook - Size: 77.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

git-khandelwal/CNN-to-GPT2
Image Captioning using CNNs and Transformers
Language: Python - Size: 15.6 KB - Last synced at: 21 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ssube/label-prompt-caption
Language: Python - Size: 127 KB - Last synced at: 19 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

basedrhys/text-od-robustness
Evaluating the robustness of text-conditioned OD models such as MDETR
Language: Jupyter Notebook - Size: 20.3 MB - Last synced at: 18 days ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

petercorke/vtt-clean
Python script to clean VTT files generated by Microsoft Stream
Language: Python - Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

AmbiTyga/GifCaptioner
A Deep Neural Network for gif captioning
Language: Python - Size: 9.55 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0
