An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: captioning

roboflow/maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Language: Python - Size: 10.6 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 2,568 - Forks: 206

drethage/fully-convolutional-point-network

Fully-Convolutional Point Networks for Large-Scale Point Clouds

Language: Python - Size: 1.53 MB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 85 - Forks: 22

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Language: Python - Size: 17.4 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 5,564 - Forks: 938

AMfeta99/NLP_LLM

This repository is dedicated to small projects and some theoretical material that I used to get into NLP and LLM in a practical and efficient way.

Language: Jupyter Notebook - Size: 85.2 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

fpgaminer/joycaption

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

Language: Python - Size: 285 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 455 - Forks: 22

Labbeti/aac-datasets

Audio Captioning datasets for PyTorch.

Language: Python - Size: 2.68 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 117 - Forks: 8

ParitoshParmar/MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

Language: Python - Size: 27.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 68 - Forks: 15

deepgram-devs/video-chat

Sample app to display live captioning to a WebRTC video session with the Deepgram API.

Language: JavaScript - Size: 392 KB - Last synced at: 1 day ago - Pushed at: over 3 years ago - Stars: 38 - Forks: 14

2dameneko/ide-cap-chan

ide-cap-chan is a utility for batch image captioning with natural language using various VL models

Language: Python - Size: 1.82 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

Labbeti/aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Language: Python - Size: 859 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 48 - Forks: 4

oshtz/tagmeister-pc

Efficient image captioning using OpenAI API

Language: TypeScript - Size: 14.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

lucidrains/AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

Language: Python - Size: 39.1 KB - Last synced at: 25 days ago - Pushed at: over 4 years ago - Stars: 42 - Forks: 5

kozhemyak/joycaption-alpha2-runpod-captioner Fork of brendanmckeag/gemma-captioner-images

This project provides a serverless runpod image captioning service using RunPod and Hugging Face's JoyCaption Alpha Two model. This service processes images/photos and generates descriptive captions or tags based on a customizable prompt.

Language: Python - Size: 76.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Hyeongkeun/LAVCap

Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)

Language: Python - Size: 3.58 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

DavidMChan/caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

Language: Python - Size: 7.43 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 4

wangleihitcs/MedicalReportGeneration

A Base Tensorflow Project for Medical Report Generation

Language: Python - Size: 69.7 MB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 72 - Forks: 18

aimagelab/pacscore

[CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

Language: Python - Size: 7.15 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 61 - Forks: 8

dragonfruit-ai/launchpad

🚀 Documentation and component library for Dragonfruit AI's Launchpad

Size: 91.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

FeiElysia/awesome-zero-shot-captioning

A curated list of zero-shot captioning papers

Size: 15.6 KB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 22 - Forks: 1

aimagelab/PMA-Net

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023

Language: Python - Size: 5.34 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 2

cd2bit/awesome-list-of-captioned-courses

Online professional courses that are captioned and/or subtitled

Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

Wylgrif/Captioninghelper

a small tool to help caption a dataset | coded in python

Language: Python - Size: 650 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 1

trucaption/trucaption

A real-time captioning system with support for large and small screen display.

Language: JavaScript - Size: 2.68 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

brayevalerien/ReCap

An image (re)captioning GUI for image generation models dataset preparation, made for easy caption editing.

Language: Python - Size: 2.45 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

ZhaoPeiduo/BLIP2-Japanese

Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.

Language: Python - Size: 75.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 12 - Forks: 1

fofr/cog-batch-image-captioning

Caption images for lora training

Language: Python - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 8 - Forks: 5

ImKeTT/ZeroGen

[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation

Language: Python - Size: 2.94 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

rayandrew/indonesian-image-captioning

Indonesian Image Captioning using Attention-based Semantic Compositional Networks

Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: about 2 months ago - Pushed at: almost 6 years ago - Stars: 14 - Forks: 5

audio-captioning/caption-evaluation-tools

Tools for the evaluation of audio captioning.

Language: Jupyter Notebook - Size: 98.7 MB - Last synced at: 6 months ago - Pushed at: about 5 years ago - Stars: 15 - Forks: 2

RyanLiut/awesome-diverse-captioning

Some papers about *diverse* image (a few videos) captioning

Size: 124 KB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 26 - Forks: 3

KennethWussmann/caption.now

Quickly and efficiently caption your image dataset for AI training

Language: TypeScript - Size: 3.76 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 1

nssharmaofficial/reddit-hole

Automated reddit scraper and video creator

Language: Python - Size: 384 KB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 12 - Forks: 2

git-khandelwal/CNN-to-GPT2

Image Captioning using CNNs and Transformers

Language: Python - Size: 15.6 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

aimagelab/camel

CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022

Language: Python - Size: 8.46 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 12

Aavtic/parashu

A video subtitle editor program in rust.

Language: Rust - Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

naiveHobo/Smart-I

Smart-I is an android application aimed at helping the visually impaired using artificial intelligence and cloud computing.

Language: Python - Size: 2.05 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 0

ssube/label-prompt-caption

Language: Python - Size: 127 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Andrew-Ng-s-number-one-fan/Readings

Size: 322 MB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

mrazhou/SEN

Single-stream Extractor Network with Contrastive Pre-training for Remote Sensing Change Captioning

Language: Python - Size: 64.2 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 1

ArchAngelAries/TagScribeR

A tool to streamline AI image captioning

Language: Python - Size: 190 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

audio-captioning/clotho-dataset

Python code for handling the Clotho dataset.

Language: Python - Size: 99.6 KB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 74 - Forks: 15

elbayadm/PaperNotes

My notes on some Deep Learning papers

Language: HTML - Size: 1.57 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 25 - Forks: 4

elbayadm/captioning

Captioning code in PyTorch

Language: Jupyter Notebook - Size: 763 MB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

jamesruan/SimpleSubtitleEditor

SimpleSubtitleEditor for Blender

Language: Python - Size: 20.5 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 10 - Forks: 2

DavidHuji/CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Language: Python - Size: 35.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 158 - Forks: 17

ebu/ebu-tt-live-toolkit

Toolkit for supporting the EBU-TT Live specification

Language: Python - Size: 112 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 10

mitvis/vistext

VisText is a benchmark dataset for semantically rich chart captioning.

Language: Jupyter Notebook - Size: 2.77 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 66 - Forks: 3

audio-captioning/dcase-2020-baseline

Audio captioning baseline system for DCASE 2020 challenge.

Language: Python - Size: 92.5 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 36 - Forks: 11

Anshler/ICG_sd_extension

Image caption extension for A1111 Webui 👁️📜🖋️

Language: Python - Size: 181 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ltguo19/VSUA-Captioning

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

Language: Python - Size: 205 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 264 - Forks: 24

deepgram-devs/twilio-live-captions

Sample app demonstrating adding live captions to Twilio Video rooms

Language: JavaScript - Size: 23.4 KB - Last synced at: 1 day ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 0

AmbiTyga/GifCaptioner

A Deep Neural Network for gif captioning

Language: Python - Size: 9.55 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

wangleihitcs/ImageCaptions

A base model for image captions.

Language: Python - Size: 96.3 MB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 1

alecwangcq/show-attend-and-tell

Language: Jupyter Notebook - Size: 3.06 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 25 - Forks: 11

TheShadow29/VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Language: Python - Size: 928 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 50 - Forks: 7

AdrianHsu/S2VT-seq2seq-video-captioning-attention

S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow

Language: Python - Size: 52.6 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 19 - Forks: 10

HaydenFaulkner/Tennis

A Tennis dataset and models for event detection & commentary generation

Language: Python - Size: 30.3 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 35 - Forks: 11

Sharif-SLPL/image-captioning

Automatically describing the content of an image in Persian

Language: Jupyter Notebook - Size: 1.21 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Mauville/MedCLIP

Medical image captioning using OpenAI's CLIP

Language: Jupyter Notebook - Size: 3.11 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 4

CurryYuan/X-Trans2Cap

[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Language: Python - Size: 64 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 29 - Forks: 3

J0SAL/Aide

An App with Voice Assisted Image Captioning and VQA For Visually Challenged Individuals

Language: Dart - Size: 18.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

petercorke/vtt-clean

Python script to clean VTT files generated by Microsoft Stream

Language: Python - Size: 1.95 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

nikhilkumarsingh/MemeGenerator

Python program to generate memes.

Language: Jupyter Notebook - Size: 274 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 6

congphase/img-captioning-in-vietnamese

An attempt to solve image captioning (in Vietnamese language) regarding ball sports contexts.

Language: Python - Size: 8.75 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 1

stevennyman/yt-transcript

JavaScript bookmarklet for viewing YouTube video transcripts in a popout window.

Language: JavaScript - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

hassanhub/R3Transformer

Official python implementation of R3-Transformer

Language: Python - Size: 54.7 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 1

basedrhys/text-od-robustness

Evaluating the robustness of text-conditioned OD models such as MDETR

Language: Jupyter Notebook - Size: 20.3 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

Dong-JinKim/DRCaptioning

Language: Jupyter Notebook - Size: 8.41 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

ebu/ebu-tt

A public repository with key information about the EBU Timed Text (EBU-TT) format.

Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

jyotishp/neural-captioning Fork of Saiteja-Reddy/Show-and-Tell

A neural network consisting of CNN and LSTM for generating captions of an image thrown at it.

Language: Jupyter Notebook - Size: 139 MB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0