Multimodal-Video-Retrieval-Engine-with-Vision-and-Text-by-NaiveNotNaice

The video search engine, created by Team NaiveNotNice for HCM AI Challenge 2024, combines OCR, ASR, CLIP, Image Captioning, and Object & Color Detection for accurate video retrieval based on text, speech, images, objects, and colors.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chisngooo%2FMultimodal-Video-Retrieval-Engine-with-Vision-and-Text-by-NaiveNotNaice

Fork of Zhennor/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text
Stars: 0
Forks: 0
Open issues: 0

License: apache-2.0
Language:
Size: 20.9 GB
Dependencies parsed at: Pending

Created at: 3 months ago
Updated at: 3 months ago
Pushed at: 3 months ago
Last synced at: 3 months ago

Topics: asr, clip, image-captioning, object-detection, ocr, text-search

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / chisngooo / Multimodal-Video-Retrieval-Engine-with-Vision-and-Text-by-NaiveNotNaice