GitHub topics: image-captioning
PRITHIVSAKTHIUR/Image-Captioning-Florence2
This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.
Language: Python - Size: 8.79 KB - Last synced at: about 6 hours ago - Pushed at: about 19 hours ago - Stars: 0 - Forks: 0

fano2458/Zhadiger-Kazakh-Language-AI
AI services project "Zhadiger" for Kazakh Language developed using NVIDIA Triton Inference Server. Including LLM, OCR, Image Captioning, NER, TTS, STT, Translator and etc.
Language: Python - Size: 47.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

cuixing158/Awesome-CV-MasterHub
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 214 - Forks: 12

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Language: Python - Size: 12.6 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 2,830 - Forks: 722

OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Language: Python - Size: 120 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 2,491 - Forks: 249

claudaff/automatic-map-storytelling
An Efficient System for Automatic Map Storytelling using Generative Pre-trained Transformer (GPT) Models – A Case Study on Historical Maps
Language: Python - Size: 1.98 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 1

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 10,440 - Forks: 1,019

SkalskiP/awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Language: Python - Size: 58.6 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 611 - Forks: 45

hsp-iit/embodied-captioning
Official repository of the preprint "Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions"
Size: 669 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

iOPENCap/awesome-remote-image-captioning
A list of awesome remote sensing image captioning resources
Language: Python - Size: 414 KB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 104 - Forks: 1

AtheerAlzhrani/BlipCaptioner
Interactive web application that generates descriptive captions for images
Language: Jupyter Notebook - Size: 132 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

HanXinzi-AI/awesome-computer-vision-resources
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Size: 49.8 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 238 - Forks: 33

aimagelab/DiCO
[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Language: Python - Size: 6.76 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 17 - Forks: 0

OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Language: Python - Size: 41.9 MB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 3,218 - Forks: 230

salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 5,165 - Forks: 681

microsoft/Oscar 📦
Oscar and VinVL
Language: Python - Size: 715 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 1,048 - Forks: 252

veydantkatyal/image-caption-recommender
recommends the most relevant image captions using OpenAI’s CLIP model and machine learning for intelligent content generation.
Language: Jupyter Notebook - Size: 260 KB - Last synced at: 37 minutes ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

imaginary-cloud/CameraManager
Simple Swift class to provide all the configurations you need to create custom camera view in your app
Language: Swift - Size: 4.7 MB - Last synced at: 9 days ago - Pushed at: 9 months ago - Stars: 1,382 - Forks: 327

aehrc/cvt2distilgpt2
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Language: Python - Size: 93.5 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 67 - Forks: 7

ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Language: Python - Size: 51.9 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 1,732 - Forks: 104

jmisilo/clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
Language: Python - Size: 873 KB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 117 - Forks: 32

anuragmishracse/caption_generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Language: Python - Size: 902 KB - Last synced at: 10 days ago - Pushed at: almost 7 years ago - Stars: 265 - Forks: 119

AkagawaTsurunaki/ZerolanLiveRobot
AI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.
Language: Python - Size: 2.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 27 - Forks: 3

Markin-Wang/awesome_radiology_report_generation
Awesome radiology report generation and image captioning papers.
Size: 59.6 KB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 72 - Forks: 6

aiishwarrya/VisualLanguageModel
A custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.
Size: 2.49 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

GT-RIPL/Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Language: Python - Size: 93.6 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 10

j-min/CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
Language: Python - Size: 2.64 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 241 - Forks: 26

german-zarate/image-captioning-app
Deployed image captioning ML model using Flask and access via Flutter app
Language: Python - Size: 7.54 MB - Last synced at: 11 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

Mohammadimh76/image-caption-generator-pytorch
Image Caption Generation using Deep Learning (CNN + LSTM Architecture)
Language: Python - Size: 1.04 GB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

aehrc/cxrmate
CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Language: Python - Size: 4.03 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 15 - Forks: 3

TheoCoombes/ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs.
Language: Python - Size: 92.7 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 96 - Forks: 13

krasserm/fairseq-image-captioning
Transformer-based image captioning extension for pytorch/fairseq
Language: Python - Size: 3.09 MB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 315 - Forks: 57

Zuellni/Image-Tools
Various image processing scripts.
Language: Python - Size: 16.6 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

PhilemonTJ/ImageCaptioningSystem
ImageCaptioningSystem is a Python application that generates descriptive captions for images using deep learning models, providing an automated interpretation of visual content.
Language: Python - Size: 20.5 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

AdirthaBorgohain/art-critiq
A multi modal pipeline to generate three tones of reviews [harsh, constructive, kind] for a given artwork using fine-tuned Flan-T5 models.
Language: Jupyter Notebook - Size: 175 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

aimagelab/PMA-Net
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
Language: Python - Size: 5.34 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 17 - Forks: 2

dayyass/image-captioning
My solution to the Image Captioning Final Project of the Coursera "Introduction to Deep Learning" course with trained model deployed as telegram bot.
Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: 6 days ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 1

aimagelab/meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Language: Python - Size: 7.07 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 531 - Forks: 135

Deiwulf/AI-image-auto-tagger Fork of Ketengan-Diffusion/wdv3-batch-vit-tagger
The ultimate open-source AI tagging tool for image galleries using metadata, or .txt files for AI training. Using newest wd-vit-tagger-v3 model by SmilingWolf
Language: Python - Size: 273 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

SocAIty/socaity
SDK for generative AI.
Language: Python - Size: 24.2 MB - Last synced at: 4 days ago - Pushed at: 27 days ago - Stars: 2 - Forks: 0

reshalfahsi/image-captioning-mobilenet-llama3
Image Captioning With MobileNet-LLaMA 3
Language: Jupyter Notebook - Size: 3.56 MB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

peteanderson80/Up-Down-Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
Language: Jupyter Notebook - Size: 2.6 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 245 - Forks: 69

yashk2810/Image-Captioning
Image Captioning using InceptionV3 and beam search
Language: Jupyter Notebook - Size: 74.6 MB - Last synced at: 12 days ago - Pushed at: over 4 years ago - Stars: 327 - Forks: 122

leftsl/ENIMNet
This is the enimnet network implementation code.
Language: Python - Size: 4.75 MB - Last synced at: 24 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

kdexd/virtex
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
Language: Python - Size: 3.65 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 561 - Forks: 61

dp-ops/Image_captioning
Image captioning model using ResNet34 and Attention LSTM. The project is implimented from scratch. Using pretrained imagenet weights for resNet34 and finetunning the model in flickr8k and flickr30k datasets
Language: Jupyter Notebook - Size: 49.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

angeligareta/image-captioning
Image Caption Generator implemented using Tensorflow and Keras in a Python Jupyter Notebook. The goal is to describe the content of an image by using a CNN and RNN.
Language: Jupyter Notebook - Size: 393 KB - Last synced at: 12 days ago - Pushed at: about 4 years ago - Stars: 31 - Forks: 12

scopeInfinity/Video2Description
Video to Text: Natural language description generator for some given video. [Video Captioning]
Language: Python - Size: 33 MB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 343 - Forks: 70

nssharmaofficial/image-caption-generator
Image captioning model with Resnet50 encoder and LSTM decoder
Language: Python - Size: 745 MB - Last synced at: 14 days ago - Pushed at: 7 months ago - Stars: 17 - Forks: 4

NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Language: Python - Size: 4.25 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 1,310 - Forks: 73

Aldenhovel/bleu-rouge-meteor-cider-spice-eval4imagecaption
Evaluation tools for image captioning. Including BLEU, ROUGE-L, CIDEr, METEOR, SPICE scores.
Language: Python - Size: 86.8 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 2

google/imageinwords
Data release for the ImageInWords (IIW) paper.
Language: JavaScript - Size: 21.4 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 209 - Forks: 9

hasnainroopawalla/Image-Captioning-Scene-Descriptor
A CNN-LSTM model to generate a sentence/caption that describes the contents/scene of an image.
Language: Jupyter Notebook - Size: 259 MB - Last synced at: 13 days ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 2

IEEE-NITK/Image_Captioning
Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions.
Language: Jupyter Notebook - Size: 9.46 MB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 8

aimagelab/show-control-and-tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Language: Python - Size: 1.71 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 281 - Forks: 61

milaan9/Deep_Learning_Algorithms_from_Scratch
This repository explores the variety of techniques and algorithms commonly used in deep learning and the implementation in MATLAB and PYTHON
Language: Jupyter Notebook - Size: 9.85 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 173 - Forks: 171

YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language: Python - Size: 12.2 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 970 - Forks: 105

ProGamerGov/VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
Language: Python - Size: 21.5 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 39 - Forks: 0

chunhuizng/mllm-video-captioner
We use RL to train a SOTA MLLM captioner.
Language: Python - Size: 1.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

shaadclt/Fine-tune-PaliGemma-Image-Captioning
This project demonstrates how to fine-tune PaliGemma model for image captioning. The PaliGemma model, developed by Google Research, is designed to handle images and generate corresponding captions.
Language: Jupyter Notebook - Size: 408 KB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 6 - Forks: 0

Pavansomisetty21/Image-Caption-Generation-using-LLMs-GEMINI-
we generate captions to the images which are given by user(user input) using prompt engineering and Generative AI
Language: Jupyter Notebook - Size: 366 KB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 7 - Forks: 1

chisngooo/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text-by-NaiveNotNaice Fork of Zhennor/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text
The video search engine, created by Team NaiveNotNice for HCM AI Challenge 2024, combines OCR, ASR, CLIP, Image Captioning, and Object & Color Detection for accurate video retrieval based on text, speech, images, objects, and colors.
Size: 20.9 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jiasenlu/AdaptiveAttention
Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
Language: Jupyter Notebook - Size: 3.75 MB - Last synced at: 10 days ago - Pushed at: over 7 years ago - Stars: 335 - Forks: 74

FirstLanguage/streamlit-firstlanguage
Streamlit components for FirstLanguage API
Language: Python - Size: 809 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 4

Bjarten/computer-vision-ND
Projects and exercises for the Udacity Computer Vision Nanodegree
Language: Jupyter Notebook - Size: 690 MB - Last synced at: 13 days ago - Pushed at: about 6 years ago - Stars: 99 - Forks: 44

sitamgithub-MSIT/paligemma2-docci-litserve
Leverage PaliGemma 2's DOCCI fine-tuned variant capabilities using LitServe.
Language: Python - Size: 468 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

rahul-vinay/ShowAttendTell
This project implements an adaptive attention mechanism for image captioning, inspired by 'Show, Attend and Tell' paper. It combines ResNet50 and LSTM with a sentinel gate to dynamically balance focus between visual features and language context.
Language: Jupyter Notebook - Size: 6.61 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jhc13/taggui
Tag manager and captioner for image datasets
Language: Python - Size: 22.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 891 - Forks: 41

brayevalerien/ReCap
An image (re)captioning GUI for image generation models dataset preparation, made for easy caption editing.
Language: Python - Size: 2.45 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

X-PLUG/mPLUG
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Language: Python - Size: 1.56 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 89 - Forks: 7

hk-kumawat/Insight-Lens
📸 An AI-powered tool for intelligent image analysis with captioning, summaries, and Q&A capabilities!
Language: Python - Size: 29.3 KB - Last synced at: 26 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ChaitanyaC22/Udacity-CVND-Project2-Automated-Image-Captioning
This project aims at training a CNN-RNN model to predict captions for a given image. The main task is to implement an effective RNN decoder for a CNN encoder.
Language: HTML - Size: 223 MB - Last synced at: 23 days ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

bushraqurban/Captionator
AI-powered image scraper and captioning tool.
Language: Python - Size: 508 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

LavanyaAN21/Depiction-of-image-features-with-audio-to-aid-visually-impaired-person
This project leverages advanced AI models to generate captions for images and translate them into regional languages (Kannada and Hindi). Additionally, it offers text-to-speech conversion, making it accessible to a wider audience, specially those with visual impairments.
Language: Python - Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

dinhanhx/vcc
Vietnamese Conceptual Caption
Language: Python - Size: 26.4 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

li-xirong/coco-cn
Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks
Language: OpenEdge ABL - Size: 195 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 187 - Forks: 21

sreyash1mohanty/Image-captioning
Image captioning model using Keras
Language: Jupyter Notebook - Size: 19.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

Roni7128/NTU-2024Fall-DLCV
CommE5052: Deep Learning for Computer Vision (Prof. Frank Wang)
Size: 1.95 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Zuellni/Qt-Caption
Language: Python - Size: 2.18 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

ammarlodhi255/image-captioning-system-to-assist-the-blind
An image captioning system that is able to predict and speak out a caption of an image taken by visually impaired.
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 9 - Forks: 7

cstsunfu/dlk
A PyTorch Based Deep Learning Quick Develop Framework. One-Stop for train/predict/server/demo
Language: Python - Size: 9.42 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 24 - Forks: 0

MiteshPuthran/Image-Caption-Generator
The LSTM model generates captions for the input images after extracting features from pre-trained VGG-16 model. (Computer Vision, NLP, Deep Learning, Python)
Language: Jupyter Notebook - Size: 69.8 MB - Last synced at: 5 days ago - Pushed at: over 5 years ago - Stars: 86 - Forks: 32

john-fante/john-fante
In my code portfolio, I generally try new techniques and methods in machine learning. I don't like only copying and pasting.
Size: 318 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

wocns1457/CCTV-based-clothing-analysis-and-search-system
Language: Python - Size: 23.1 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

sitamgithub-MSIT/paligemma-docci
Image Captioning with PaliGemma 2 Vision Language Model.
Language: Python - Size: 1.26 MB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

aehrc/imageclefmedical_caption_23
MedICap: Code for the participation of team CSIRO at the ImageCLEFmedical Caption task of 2023.
Language: Jupyter Notebook - Size: 643 KB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

raj-tyagi/4CLIP-Image-Captioning
This repository presents 4CLIP, a novel approach to image captioning that enhances traditional models by dividing images into four quadrants and processing them individually. By leveraging a pretrained ViT-GPT2 model from Hugging Face, 4CLIP generates more detailed and comprehensive captions, making it suitable for fine-grained visual tasks.
Language: Python - Size: 288 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

terry-r123/Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
Size: 56.6 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 110 - Forks: 10

peteanderson80/coco-caption
Adds SPICE metric to coco-caption evaluation server codes
Language: Jupyter Notebook - Size: 121 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 42

markdtw/soft-attention-image-captioning
tensorflow implementation of show, attend and tell (ICML'15)
Language: Python - Size: 639 KB - Last synced at: 10 days ago - Pushed at: almost 8 years ago - Stars: 19 - Forks: 11

Nexdata-AI/300-million-pairs-of-high-quality-image-caption-dataset
Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Mahmood-Anaam/arabic-visual-question-answering
Modular and extensible framework for Arabic Visual Question Answering (VQA) using state-of-the-art pretrained models for image captioning and question answering.
Language: Jupyter Notebook - Size: 3.75 MB - Last synced at: 16 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Snigdho8869/AI-Generative-Models-Notebooks-DCGAN-VAE-Autoencoder
This repository contains notebooks showcasing various generative models, including DCGAN and VAE for anime face generation, an Autoencoder for converting photos to sketches, a captioning model using an attention mechanism for an image caption generator, and more.
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

berlin0308/NTU-2024Fall-DLCV
CommE5052: Deep Learning for Computer Vision (Prof. Frank Wang)
Language: Jupyter Notebook - Size: 41.6 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

MuhammadHadiofficial/urdu_caption_generator
This repository contains the implementation of a Transformer-based model for Urdu Image Caption Generation, presented in the study "A Transformer-based Urdu Image Caption Generation." The project aims to generate syntactically, contextually, and semantically correct captions in Urdu for given images. It addresses the challenges of working with low-
Size: 4.88 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

santoshlite/ByteDetective
The easiest way to search for images on your desktop 🔎
Language: Rust - Size: 3.87 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 30 - Forks: 2

kacky24/stylenet
A pytorch implemention of "StyleNet: Generating Attractive Visual Captions with Styles"
Language: Python - Size: 13.2 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 62 - Forks: 10

adityajn105/image-caption-bot
Implementation of 'merge' architecture for generating image captions from paper "What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?" using Keras. Dataset used is Flickr8k available on Kaggle.
Language: Jupyter Notebook - Size: 10.6 MB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 16 - Forks: 14

mmahdin/CI_CNNProject_Fall2024
image classification on CIFAR-10 with ResNet, medical image analysis on breast histopathology images using CNNs, and image captioning on Flickr8k, Flickr30k, and MSCOCO datasets with advanced architectures like LSTM and attention mechanisms.
Language: Jupyter Notebook - Size: 333 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

gauthiii/fineTunedBLIP
Fine Tuned the model BLIP to accurately caption images of Tom and Jerry.
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
