GitHub topics: vision-transformer
qubvel/transformers-notebooks
Inference and fine-tuning examples for vision models from 🤗 Transformers
Language: Jupyter Notebook - Size: 33.1 MB - Last synced at: about 5 hours ago - Pushed at: about 18 hours ago - Stars: 76 - Forks: 13

MaxwellYaoNi/PACE
[NeurIPS 2024 Spotlight] Official implementation for "PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization"
Language: Python - Size: 172 KB - Last synced at: about 6 hours ago - Pushed at: about 19 hours ago - Stars: 10 - Forks: 0

tue-mps/eomt
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
Language: Jupyter Notebook - Size: 5.29 MB - Last synced at: about 7 hours ago - Pushed at: about 21 hours ago - Stars: 111 - Forks: 5

huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Language: Python - Size: 98.4 MB - Last synced at: about 22 hours ago - Pushed at: about 1 month ago - Stars: 4,184 - Forks: 718

PRITHIVSAKTHIUR/Fashion-Product-Usage
Fashion-Product-Usage is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies fashion product images based on their intended usage context.
Language: Python - Size: 11.7 KB - Last synced at: about 14 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Anne-Andresen/3D-Vision-transformer
Self configuring and adapting vision transformer for segmentation of 3d images
Language: Python - Size: 4.09 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

yutingshih/vit-quant
Quantization for vision transformers
Language: Python - Size: 17.6 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

cmhungsteve/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
Size: 5.65 MB - Last synced at: about 1 hour ago - Pushed at: 9 months ago - Stars: 4,836 - Forks: 492

mahmoodlab/HIPT
Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Language: Jupyter Notebook - Size: 740 MB - Last synced at: about 22 hours ago - Pushed at: about 1 year ago - Stars: 555 - Forks: 96

deepglint/unicom
Large-Scale Visual Representation Model
Language: Python - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 659 - Forks: 27

PRITHIVSAKTHIUR/Multilabel-Portrait-SigLIP2
Multilabel-Portrait-SigLIP2 is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies portrait-style images into one of the following visual portrait categories:
Language: Python - Size: 10.7 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

zhongkaifu/Seq2SeqSharp
Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Language: C# - Size: 432 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 203 - Forks: 42

lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Language: Python - Size: 9.06 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 14,100 - Forks: 1,119

Rajadhopiya/Gender-Classifier-Mini
Gender-Classifier-Mini is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images based on gender using the SiglipForImageClassification architecture.
Language: Python - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

open-mmlab/mmdetection
OpenMMLab Detection Toolbox and Benchmark
Language: Python - Size: 61.8 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 30,795 - Forks: 9,627

MrAlonso9/Hand-Gesture-2-Robot
Hand-Gesture-2-Robot is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the SiglipForImageClassification architecture.
Language: Python - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Sanyi54/Clipart-126-DomainNet
Clipart-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the SiglipForImageClassification architecture
Language: Python - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

hyuki875/Transformers
The Transformers repository provides a comprehensive implementation of the Transformer architecture, a groundbreaking model that has revolutionized both Natural Language Processing (NLP) and Computer Vision tasks. Introduced in the seminal paper "Attention is All You Need" by Vaswani et al.
Size: 1.95 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

microsoft/Cream
This is a collection of our NAS and Vision Transformer work.
Language: Python - Size: 8.52 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 1,741 - Forks: 237

cuixing158/Awesome-CV-MasterHub
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 214 - Forks: 12

ViTAE-Transformer/Remote-Sensing-RVSA
The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"
Language: Python - Size: 4.56 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 441 - Forks: 38

HMUNACHI/VisionArchitectures
Comparative Analysis of SOTA Vision Architectures; VGG, GoogleNet, ResNet & Vision Transformers.
Language: Jupyter Notebook - Size: 2.29 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

OpenGVLab/PIIP
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
Language: Python - Size: 11 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 87 - Forks: 2

PRITHIVSAKTHIUR/Geometric-Shapes-Classification
Geometric-Shapes-Classification is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a multi-class shape recognition task. It classifies various geometric shapes using the SiglipForImageClassification architecture.
Language: Python - Size: 0 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
Size: 15 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 282 - Forks: 12

NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
Language: Python - Size: 1.21 MB - Last synced at: 1 day ago - Pushed at: 29 days ago - Stars: 843 - Forks: 68

NielsRogge/Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Language: Jupyter Notebook - Size: 224 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 10,576 - Forks: 1,578

thu-ml/SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Language: Cuda - Size: 55.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 435 - Forks: 27

pprp/awesome-attention-mechanism-in-cv
Awesome List of Attention Modules and Plug&Play Modules in Computer Vision
Language: Python - Size: 3.25 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 1,168 - Forks: 170

SHI-Labs/VMFormer
[Preprint] VMFormer: End-to-End Video Matting with Transformer
Language: Python - Size: 2.69 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 114 - Forks: 9

JingyunLiang/SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
Language: Python - Size: 29.1 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 4,772 - Forks: 574

ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Language: Python - Size: 10.5 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 1,569 - Forks: 205

Blaizzy/mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Language: Python - Size: 31.3 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 1,155 - Forks: 109

Haiyang-W/GiT
[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
Language: Python - Size: 12.5 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 345 - Forks: 15

open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
Language: Python - Size: 13.5 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 3,620 - Forks: 1,081

AImageLab-zip/MONKEY_challenge_ziplab
AImageLab Zip UNIMORE solution for the MONKEY Challenge of Radboud University Medical Center
Language: Jupyter Notebook - Size: 295 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

Adlith/MoE-Jetpack
[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Language: Python - Size: 32.3 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 115 - Forks: 1

zer0int/CLIP-XAI-GUI
CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models
Language: Python - Size: 3.46 MB - Last synced at: about 11 hours ago - Pushed at: 7 months ago - Stars: 20 - Forks: 1

mashaan14/YouTube-channel
Code I used for my YouTube videos
Language: Jupyter Notebook - Size: 1.36 GB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 6 - Forks: 2

lxa9867/ImageFolder
High-performance Image Tokenizers for VAR and AR
Language: Python - Size: 3.97 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 239 - Forks: 5

baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
Language: Python - Size: 8.61 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 2,467 - Forks: 184

InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Language: Python - Size: 199 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 2,805 - Forks: 171

hustvl/YOLOS
[NeurIPS 2021] You Only Look at One Sequence
Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 862 - Forks: 122

SunzeY/AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Language: Jupyter Notebook - Size: 173 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 802 - Forks: 55

ViTAE-Transformer/P3M-Net
The official repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving"
Language: Python - Size: 56.2 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 104 - Forks: 9

SalvatoreRa/tutorial
Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)
Language: Jupyter Notebook - Size: 180 MB - Last synced at: 6 days ago - Pushed at: 18 days ago - Stars: 184 - Forks: 29

FoundationVision/VAR
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language: Jupyter Notebook - Size: 620 KB - Last synced at: 9 days ago - Pushed at: 28 days ago - Stars: 7,408 - Forks: 463

JersonGB22/ComputerVision
Repository of Computer Vision models based on CNNs, Vision Transformers, and YOLO11, implemented with TensorFlow, PyTorch, Hugging Face, and Ultralytics.
Size: 6.56 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0

google-research/scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
Language: Python - Size: 63.7 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 3,495 - Forks: 453

Westlake-AI/openmixup
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
Language: Python - Size: 3.68 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 646 - Forks: 59

kyegomez/Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
Language: Python - Size: 211 KB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 172 - Forks: 15

topazape/ViT-Pytorch
Vision Transformer in Pytorch
Language: Python - Size: 4.62 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

NVlabs/VoxFormer
Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
Language: Python - Size: 53.4 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 1,106 - Forks: 90

uncbiag/Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
Size: 296 KB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 974 - Forks: 46

ziqipang/LM4VisualEncoding
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
Language: Python - Size: 538 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 234 - Forks: 8

ShoufaChen/AdaptFormer
[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"
Language: Python - Size: 2.67 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 352 - Forks: 21

srinadh99/AstroFormer
Photometry Guided Cross Attention Transformers for Astronomical Image Processing
Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

adithya-s-k/omniparse
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Language: Python - Size: 592 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 6,463 - Forks: 524

Polymath-Saksh/deepfake
DeepFake Detection Tool, hosted on a Django Framework with a Self-Trained Vision Transformers model.
Language: Python - Size: 85.6 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 2 - Forks: 1

mit-han-lab/efficientvit
Efficient vision foundation models for high-resolution generation and perception.
Language: Python - Size: 207 MB - Last synced at: 9 days ago - Pushed at: 16 days ago - Stars: 2,790 - Forks: 215

naver/unic
PyTorch code and pretrained weights for the UNIC models.
Language: Python - Size: 367 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 29 - Forks: 1

czczup/ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
Language: Python - Size: 1.78 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 1,349 - Forks: 143

towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Language: Python - Size: 37.2 MB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 3,346 - Forks: 258

marqo-ai/marqo-FashionCLIP
State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.
Language: Python - Size: 11.2 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 84 - Forks: 8

ViTAE-Transformer/ViTAE-Transformer
The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"
Language: Python - Size: 22.2 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 269 - Forks: 29

sftwre/TrafficTransformer
ViT model that predicts vehicle-collision time on a video stream from a Nexar dashcam
Language: Python - Size: 189 KB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

miniHuiHui/awesome-high-order-neural-network
Size: 43.9 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 46 - Forks: 4

Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Language: Python - Size: 34.6 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 401 - Forks: 56

JingyunLiang/RVRT
Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurlPS2022, official repository)
Language: Python - Size: 2.81 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 374 - Forks: 36

hila-chefer/Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
Language: Jupyter Notebook - Size: 3.76 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 1,867 - Forks: 248

PrivateGER/clipsight
Search through your image collection using natural language descriptions or similar image queries.
Language: Python - Size: 145 KB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

PRITHIVSAKTHIUR/Food-101-93M
Food-101-93M is a fine-tuned image classification model built on top of google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It is trained to classify food images into one of 101 popular dishes, derived from the Food-101 dataset.
Language: Python - Size: 0 Bytes - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

aehrc/cvt2distilgpt2
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Language: Python - Size: 93.5 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 67 - Forks: 7

alohays/awesome-visual-representation-learning-with-transformers
Awesome Transformers (self-attention) in Computer Vision
Size: 73.2 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 270 - Forks: 38

JingyunLiang/VRT
VRT: A Video Restoration Transformer (official repository)
Language: Python - Size: 12.7 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 1,429 - Forks: 135

ViTAE-Transformer/ViTDet
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
Language: Python - Size: 8.29 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 559 - Forks: 46

ViTAE-Transformer/MTP
The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"
Language: Python - Size: 18 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 215 - Forks: 11

Little-Podi/GRM
[CVPR'23] The official PyTorch implementation of our CVPR 2023 paper: "Generalized Relation Modeling for Transformer Tracking".
Language: Python - Size: 660 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 75 - Forks: 8

PRITHIVSAKTHIUR/Mirage-Photo-Classifier
Mirage-Photo-Classifier is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a binary image authenticity classification task. It is designed to determine whether an image is real or AI-generated (fake) using the SiglipForImageClassification architecture.
Language: Python - Size: 11.7 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

sovit-123/vision_transformers
Vision Transformers for image classification, image segmentation, and object detection.
Language: Python - Size: 44.1 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 49 - Forks: 9

OSU-MLB/ViT_PEFT_Vision
[CVPR'25] Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Language: Jupyter Notebook - Size: 3.54 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 31 - Forks: 0

omerbt/Splice
Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)
Language: Jupyter Notebook - Size: 206 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 32

hustvl/MIMDet
[ICCV 2023] You Only Look at One Partial Sequence
Language: Python - Size: 551 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 339 - Forks: 30

junchen14/Multi-Modal-Transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
Size: 354 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 225 - Forks: 31

emcf/thepipe
Extract clean data from anywhere, powered by vision-language models ⚡
Language: Python - Size: 4.12 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 1,248 - Forks: 79

mist-medical/MIST
MIST: A simple and scalable end-to-end framework for 3D medical imaging segmentation.
Language: Python - Size: 1.14 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 43 - Forks: 12

RuoyuChen10/SMDL-Attribution
[ICLR 2024 Oral] Less is More: Fewer Interpretable Region via Submodular Subset Selection
Language: Jupyter Notebook - Size: 28.8 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 78 - Forks: 4

ViTAE-Transformer/APTv2
The official repo for the extension of [NeurIPS'22] "APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking": https://github.com/pandorgan/APT-36K
Language: Python - Size: 9.89 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 19 - Forks: 0

StefanHeng/ECG-Representation-Learning
Self-supervised pre-training for ECG representation with inspiration from transformers & computer vision
Language: Python - Size: 37.7 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 6

XPixelGroup/RethinkVSRAlignment
(NIPS 2022) Rethinking Alignment in Video Super-Resolution Transformers
Language: Python - Size: 6.23 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 125 - Forks: 9

baudm/parseq
Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
Language: Python - Size: 1.27 MB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 631 - Forks: 135

staghado/vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
Language: C++ - Size: 2.17 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 265 - Forks: 21

rentainhe/visualization
a collection of visualization function
Language: Python - Size: 2.82 MB - Last synced at: 14 days ago - Pushed at: over 3 years ago - Stars: 416 - Forks: 40

mv-lab/swin2sr
[ECCV] Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. Advances in Image Manipulation (AIM) workshop ECCV 2022. Try it out! over 3.3M runs https://replicate.com/mv-lab/swin2sr
Language: Python - Size: 20 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 613 - Forks: 72

chanjoong-kim/mobilevit-pytorch-cifar10
A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer" (arXiv, 2021), with custom modifications and trained on the CIFAR-10 dataset.
Language: Python - Size: 15.6 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

AnanthaPadmanaban-KrishnaKumar/CriticalHeads
Identifying essential attention mechanisms in SAM through systematic ablation studies to enable efficient model compression
Size: 0 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Trash-Net
Trash-Net is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images of waste materials into different categories using the SiglipForImageClassification architecture
Language: Python - Size: 12.7 KB - Last synced at: 10 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

PRITHIVSAKTHIUR/Hand-Gesture-2-Robot
Hand-Gesture-2-Robot is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the SiglipForImageClassification architecture.
Language: Python - Size: 12.7 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

veb-101/Attention-and-Transformers
Transformers goes brrr... Attention and Transformers from scratch in TensorFlow. Currently contains Vision transformers, MobileViT-v1, MobileViT-v2, MobileViT-v3
Language: Python - Size: 250 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 13 - Forks: 2

chou141253/FGVC-PIM
Pytorch implementation for "A Novel Plug-in Module for Fine-Grained Visual Classification". fine-grained visual classification task.
Language: Python - Size: 1.41 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 197 - Forks: 40
