Topic: "image-text"
salesforce/ALBEF
Code for ALBEF: a new vision-language pre-training method
Language: Python - Size: 69.9 MB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 1,627 - Forks: 205

Sense-GVT/DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Language: Python - Size: 970 KB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 649 - Forks: 32

google/imageinwords
Data release for the ImageInWords (IIW) paper.
Language: JavaScript - Size: 21.4 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 209 - Forks: 9

X-PLUG/mPLUG
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Language: Python - Size: 1.56 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 89 - Forks: 7

labyrinth7x/Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching
Deep Cross-Modal Projection Learning for Image-Text Matching
Language: Python - Size: 39.2 MB - Last synced at: 8 months ago - Pushed at: over 4 years ago - Stars: 72 - Forks: 21

glami/glami-1m
The largest multilingual image-text classification dataset. It contains fashion products.
Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 71 - Forks: 7

miccunifi/QualiCLIP
Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment
Language: Python - Size: 5.37 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 60 - Forks: 1

TheoCoombes/crawlingathome
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
Language: Python - Size: 138 KB - Last synced at: 21 days ago - Pushed at: about 2 years ago - Stars: 33 - Forks: 7

antonlukin/poster-editor
Wrapper for PHP's GD Library for easy image manipulation. Support for scaling multi-line text, shapes, filters and smart resize.
Language: PHP - Size: 12.2 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 20 - Forks: 2

zhangming8/ocr_algo_server
ocr文字识别算法服务
Language: C++ - Size: 5.89 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 17 - Forks: 5

HuangRunHua/LiveTextWithImage
WWDC22: Enabling Live Text interactions with images in SwiftUI
Language: Swift - Size: 2.71 MB - Last synced at: 4 days ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 1

zabir-nabil/imagebert-keras
Keras implementation of ImageBERT from Microsoft
Size: 2.93 KB - Last synced at: 22 days ago - Pushed at: about 5 years ago - Stars: 14 - Forks: 2

TheoCoombes/crawlingathome-server
A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
Language: Python - Size: 1.32 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 9 - Forks: 4

Thisisus7/ING-VP
An Interactive Game-based Vision Planning benchmark
Language: Python - Size: 2.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

waysup/Img2Txt
Img2Txt - A Dropzone 3 action which recognizes texts in images(jpg/png) using baidu OCR API
Language: Python - Size: 23.4 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 0

reshalfahsi/image-captioning-mobilenet-llama3
Image Captioning With MobileNet-LLaMA 3
Language: Jupyter Notebook - Size: 3.56 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

leeyunjai/image2text
caption generator using lavis and argostranslate
Language: Python - Size: 128 KB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 1

dinhanhx/VisualRoBERTa
The first public Vietnamese visual linguistic foundation model(s)
Language: Python - Size: 98.6 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 2

dngo-io/cover-creator
Write texts on images with php
Language: PHP - Size: 313 KB - Last synced at: about 2 months ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 4

fatemeh-mohseni-AI/most-repeated-vocabulary-IELTS
This project is a FastAPI-based web application designed to analyze C a m b r i d g e I E L T S P D F s ( B o o k s 1 − 18 ) for the most and least repeated words. It can handle both regular text-based PDFs and scanned image-based PDFs by converting them to images and extracting text using OCR (Optical Character Recognition).
Language: Python - Size: 1.28 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 1

awsaf49/flickr-dataset
Download flickr8k, flickr30k image caption datasets
Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

jianzhnie/MultimodalTransformers
lmmtoolkit is a toolkit for Multi-Modal Learning
Language: Python - Size: 22.5 KB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

formulae-org/package-graphic-raster-js
Raster graphics package for Fōrmulæ, in JavaScript
Language: JavaScript - Size: 82 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

dvlab-research/TagCLIP
Language: Python - Size: 2.43 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

CharlesYang030/MTA
MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation
Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

CharlesYang030/FCLL
FCLL: A Fine-grained Contrastive Language-Image Learning Model
Language: Jupyter Notebook - Size: 5.51 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

dinhanhx/VL-datasets
Some Python scripts to load Vietnamese visual linguistic data
Language: Python - Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

waittim/ConVIRT-Colab Fork of edreisMD/ConVIRT-pytorch
Contrastive Learning Representations for Images and Text Pairs. Colab implementation of ConVIRT for transfer learning with insufficient data volume.
Language: Jupyter Notebook - Size: 13.6 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Nexdata-AI/10000-Image-caption-data-of-diverse-scenes
10000-Image-caption-data-of-diverse-scenes
Size: 2.44 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/10000-Image-caption-data-of-vehicles
10000-Image-caption-data-of-vehicles
Size: 1.12 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/10100-Image-caption-data-of-human-face
10100-Image-caption-data-of-human-face
Size: 1.57 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/10000-Image-caption-data-of-gestures
10000-Image-caption-data-of-gestures
Size: 1.35 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes
20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes
Size: 1.38 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/11000-Image-Video-caption-data-of-human-action
11000-Image-Video-caption-data-of-human-action
Size: 2.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

CharlesYang030/PolCLIP
PolCLIP: A Unified Image-Text Word Sense Disambiguation Model via Generating Multimodal Complementary Representations
Language: Jupyter Notebook - Size: 16.9 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

AkshayBura/Character-Recognition
Character Recognition system using CNN and Streamlit
Language: Jupyter Notebook - Size: 765 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ask0ne/ocrator
Scan text from an image and convert into speech/audio of desired language.
Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ppraneeth270/img2text
Language: Python - Size: 58.6 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

xiongshufeng/MTFN-RR-PyTorch-Code Fork of Wangt-CN/MTFN-RR-PyTorch-Code
The offical code for paper "Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking", ACM Multimedia 2019 Oral
Size: 4.54 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

makefile/text_extraction Fork of lluisgomez/text_extraction
Windows version of text_extraction(VS2013). This code is the implementation of the method proposed in the paper “Multi-script text extraction from natural scenes” (Gomez & Karatzas) to appear in ICDAR2013 conference.
Language: C++ - Size: 1.38 MB - Last synced at: 12 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0
