Multimodal-Video-Retrieval-Engine-with-Vision-and-Text

A video search engine combining OCR, ASR, CLIP, Image Captioning, Object & Color Detection. It enables accurate retrieval based on text, speech, images, objects, and colors in video content.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zhennor%2FMultimodal-Video-Retrieval-Engine-with-Vision-and-Text

Stars: 3
Forks: 1
Open issues: 0

License: None
Language:
Size: 20.9 GB
Dependencies parsed at: Pending

Created at: 10 months ago
Updated at: 4 months ago
Pushed at: 4 months ago
Last synced at: 4 months ago

Topics: asr, captioning-images, clip, color, faiss, fastapi, multimodal, object-detection, ocr, text-search

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / Zhennor / Multimodal-Video-Retrieval-Engine-with-Vision-and-Text