GitHub topics: vision-transformers

Repositories

fahadshamshad/awesome-transformers-in-medical-imaging

A collection of resources on applications of Transformers in Medical Imaging.

Size: 3.92 MB - Last synced at: about 19 hours ago - Pushed at: over 1 year ago - Stars: 1,261 - Forks: 194

BaaaanN/Unsupervised-Domain-Adaptation-and-ViTs

🌍 Enhance land cover classification with our Unsupervised Domain Adaptation framework using Vision Transformers for multimodal satellite imagery.

Language: Python - Size: 267 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

lucidrains/metnet3-pytorch

Implementation of MetNet-3, SOTA neural weather model out of Google Deepmind, in Pytorch

Language: Python - Size: 1.06 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 221 - Forks: 28

vishal-n2403/Unsupervised-Domain-Adaptation-and-ViTs

ViT + MAE for UDA on Sentinel-1/2 (SAR/optical) land-cover classification with CORAL & DANN. PyTorch.

Language: Python - Size: 265 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

microsoft/esvit

EsViT: Efficient self-supervised Vision Transformers

Language: Python - Size: 1.88 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 412 - Forks: 41

aim-uofa/Poseur

[ECCV 2022] The official repo for the paper "Poseur: Direct Human Pose Regression with Transformers".

Language: Python - Size: 11.7 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 181 - Forks: 14

xiaojieli0903/CKPD-FSCIL

Official code of "Continuous Knowledge-Preserving Decomposition with Adaptive Layer Selection for Few-Shot Class-Incremental Learning"

Language: Python - Size: 1.99 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 32 - Forks: 1

udihermawan/EmpathAI-Your-Emotional-Well-being-Companion

EmpathAI: Emotional Well-being Companion "Where AI Meets Heart: Healing Isolation, One Conversation at a Time." EmpathAI uses Generative AI, Computer Vision, and NLP to provide real-time emotion detection, personalized conversations, and mental health support—empowering users with empathy, privacy, and cultural inclusion.

Language: Python - Size: 15.9 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

imagine-laboratory/squeeze_every_bit

This is the official code for 'Squeeze Every Bit of Insight: Leveraging Few-shot Models with a Compact Support Set for Domain Transfer in Object Detection from Pineapple Fields' and 'Simple Object Detection Framework without Training' project.

Language: Python - Size: 13.5 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 2 - Forks: 0

sniperbroco/bookfusion-classification-app

a classification app using fine-tuned DL models

Language: Python - Size: 74.3 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

UdbhavPrasad072300/Transformer-Implementations

Library - Vanilla, ViT, DeiT, BERT, GPT

Language: Jupyter Notebook - Size: 3.29 MB - Last synced at: 13 days ago - Pushed at: almost 4 years ago - Stars: 67 - Forks: 18

uncbiag/SegNext

Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts (CVPR 2024)

Language: Python - Size: 88.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 95 - Forks: 13

ian-chuang/gaze-av-aloha

Code for paper: "Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers"

Language: Jupyter Notebook - Size: 72.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 0

Imageomics/Finer-CAM

This is an official implementation for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation. [CVPR'25]

Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 37 - Forks: 4

mehmetkahya0/RealVision-ObjectUnderstandingAI

RealVision: A powerful, real-time object detection and understanding application using Python, OpenCV, and state-of-the-art AI models. Features dual model support (YOLO v8 + MobileNet-SSD), object tracking, performance monitoring, and modern GUI interface.

Language: HTML - Size: 115 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

theSohamTUmbare/CLIP-model

Reimplementation of the CLIP model

Language: Jupyter Notebook - Size: 1.29 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

hamidhosen42/Enhancing-Glaucoma-Diagnosis-with-Explainable-AI-Using-Vision-Transformers-Deep-Learning-Techniques

This project presents an explainable AI-based glaucoma diagnosis system using deep learning and Vision Transformers (ViTs). Retinal fundus images are preprocessed with techniques like CLAHE and edge detection to enhance feature extraction. Multiple models, including CNN, VGG16/19, InceptionResNetV2, Xception, and ViTs, were evaluated, with ViTs ach

Language: Jupyter Notebook - Size: 25.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

najmulmowla1/Earthquake-Building-Damage-Detection

Earthquake building damage detection using UAV-based image datasets.

Language: Python - Size: 14.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

anas-zafar/LLM-Survey

The official GitHub page for the survey paper "A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage"

Size: 29.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 33 - Forks: 6

uncbiag/SimpleClick

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers (ICCV 2023)

Language: Python - Size: 40.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 239 - Forks: 40

kyegomez/SSM-As-VLM-Bridge

An exploration into leveraging SSM's as Bridge/Adapter Layers for VLM

Language: Python - Size: 2.19 MB - Last synced at: 11 days ago - Pushed at: 25 days ago - Stars: 2 - Forks: 1

billpsomas/efficient-probing

This repo contains the official implementation of the paper "Attention, Please! Revisiting Attentive Probing for Masked Image Modeling"

Language: Python - Size: 123 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

nateraw/huggingpics

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Language: Jupyter Notebook - Size: 972 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 303 - Forks: 28

itsDaiton/masters-thesis

Exploration and Comparison of Transformers for Image Classification.

Language: Jupyter Notebook - Size: 41.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

chikap421/videosam

[IEEE SSD 2025] This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"

Language: Jupyter Notebook - Size: 163 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 6 - Forks: 1

Hadi-M-Ibrahim/Beyond-Conventional-Transformers

Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation

Language: Python - Size: 3.1 GB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

NVlabs/FAN

Official PyTorch implementation of Fully Attentional Networks

Language: Python - Size: 8.6 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 478 - Forks: 28

baaivision/Uni3D

[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI

Language: Python - Size: 6.05 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 569 - Forks: 37

jacobgil/pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

Language: Python - Size: 134 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 11,641 - Forks: 1,636

NERSC/sc23-dl-tutorial

SC23 Deep Learning at Scale Tutorial Material

Language: Python - Size: 15.7 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 45 - Forks: 10

cosmoimd/feature-selection-gates

Feature Selection Gates with Gradient Routing

Language: Python - Size: 25.5 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

zubair-irshad/NeRF-MAE

[ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Language: Python - Size: 4.47 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 101 - Forks: 4

raoyongming/DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

Language: Jupyter Notebook - Size: 8.78 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 603 - Forks: 75

yessasvini23/EmpathAI-Your-Emotional-Well-being-Companion

Language: Python - Size: 15.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sayakpaul/deit-tf

Includes PyTorch -> Keras model porting code for DeiT models with fine-tuning and inference notebooks.

Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 41 - Forks: 7

autodistill/autodistill-owl-vit

OWL-ViT module for Autodistill.

Language: Python - Size: 13.7 KB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 7 - Forks: 3

murufeng/Awesome_vision_transformer

Implementation of vision transformer. ⭐⭐⭐

Language: Python - Size: 198 KB - Last synced at: 8 days ago - Pushed at: almost 4 years ago - Stars: 33 - Forks: 7

wangkai930418/attndistill

code for our paper "Attention Distillation: self-supervised vision transformer students need more guidance" in BMVC 2022

Language: Python - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 17 - Forks: 0

davide-coccomini/Adversarial-Magnification-to-Deceive-Deepfake-Detection-through-Super-Resolution

Official code for the paper "Adversarial Magnification to Deceive Deepfake Detection through Super Resolution"

Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 3

marziehoghbaie/VLFAT

"Transformer-based end-to-end classification of variable-length volumetric data" that will appear in MICCAI 2023.

Language: Python - Size: 375 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

ShirAmir/dino-vit-features

Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".

Language: Python - Size: 5.85 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 422 - Forks: 51

Picsart-AI-Research/SeMask-Segmentation

[NIVT Workshop @ ICCV 2023] SeMask: Semantically Masked Transformers for Semantic Segmentation

Language: Python - Size: 2.17 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 253 - Forks: 37

georgosgeorgos/few-shot-diffusion-models

Few-Shot Diffusion Models

Language: Python - Size: 1.51 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 105 - Forks: 5

yuxumin/PoinTr

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

Language: Python - Size: 25.6 MB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 666 - Forks: 113

DirtyHarryLYL/Transformer-in-Vision

Recent Transformer-based CV and related works.

Size: 1.84 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 1,332 - Forks: 143

emnzn/DINO

Self-distillation with no labels

Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

chinmaynehate/DFSpot-Deepfake-Recognition

Determine whether a given video sequence has been manipulated or synthetically generated

Language: Python - Size: 25.9 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 98 - Forks: 19

sayakpaul/deploy-hf-tf-vision-models

This repository shows various ways of deploying a vision model (TensorFlow) from 🤗 Transformers.

Language: Jupyter Notebook - Size: 867 KB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 30 - Forks: 2

struggling-student/Tiny-ViT

🤖 Tiny Vision Transformer (Tiny-ViT): Transformers for Image Recognition

Language: Python - Size: 35.4 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

VITA-Group/SViTE

[NeurIPS'21] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

Language: Python - Size: 615 KB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 89 - Forks: 12

raj-tyagi/4CLIP-Image-Captioning

This repository presents 4CLIP, a novel approach to image captioning that enhances traditional models by dividing images into four quadrants and processing them individually. By leveraging a pretrained ViT-GPT2 model from Hugging Face, 4CLIP generates more detailed and comprehensive captions, making it suitable for fine-grained visual tasks.

Language: Python - Size: 288 KB - Last synced at: about 23 hours ago - Pushed at: 7 months ago - Stars: 0 - Forks: 1

sayakpaul/ViT-jax2tf

This repository hosts code for converting the original Vision Transformer models (JAX) to TensorFlow.

Language: Jupyter Notebook - Size: 651 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 33 - Forks: 6

YifanXu74/Evo-ViT

Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Language: Python - Size: 1.86 MB - Last synced at: 8 months ago - Pushed at: about 3 years ago - Stars: 72 - Forks: 5

Faiga91/ViT-FlexibleHeads

Vision Transformers with Flexible Heads

Language: Python - Size: 135 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

xapaxca/swiftdepth

SwiftDepth: An Efficient Hybrid CNN-Transformer Model for Self-Supervised Monocular Depth Estimation on Mobile Devices

Language: Python - Size: 122 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

Mr-TalhaIlyas/Segmentation-Transformer-Object-Contextual-Representations-for-Semantic-Segmentation-OCR

PyTorch Implementation of OCR (Object-Contextual Representations)

Language: Python - Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

kyegomez/VisionLLaMA

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta

Language: Python - Size: 2.19 MB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 16 - Forks: 0

protyayofficial/Vision-Architectures

A repository containing implementations of famous Vision Architectures over the years

Language: Python - Size: 191 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

nachiket273/VisTrans

Implementations of transformers based models for different vision tasks

Language: Python - Size: 112 KB - Last synced at: 2 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

rprkh/Gravitational-Lensing

Streamlit app that performs binary and multiclass classification of gravitational lensing images along with dark matter halo mass prediction.

Language: Jupyter Notebook - Size: 3.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

suryansh-sinha/ViT-From-Scratch

Implemented a Vision Transformer from famous paper 'An Image is Worth 16x16 Images'. Implemented the Attention and Multi-Head Attention mechanisms from scratch in PyTorch.

Language: Python - Size: 6.84 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Marklong7/cats-and-dogs-classification Fork of kayyyywu/cats-and-dogs-classification

Deep learning pet breed recognition app

Language: Jupyter Notebook - Size: 16.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shizhouxing/ViT_vnncomp2023

Benchmark for formally verifying ViTs

Language: Python - Size: 19.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Seeker38/image_abstract_generating

image-captioning using ViT-PhoBERT model

Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

kahnchana/svt

Official repository for "Self-Supervised Video Transformer" (CVPR'22)

Language: Python - Size: 682 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 99 - Forks: 21

uncbiag/iSegFormer

iSegFormer: Interactive Image/Volume Segmentation using Vision Transformers (MICCAI 2022)

Language: Python - Size: 40.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 3

antonio-f/Moondream

Testing the Moondream tiny vision model

Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

rishikksh20/CrossViT-pytorch

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Language: Python - Size: 229 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 171 - Forks: 18

antocad/FocusOnDepth

A Monocular depth-estimation for in-the-wild AutoFocus application.

Language: Python - Size: 7.91 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 130 - Forks: 32

hadar-hai/vit-vs-cnn-on-elephants

This project focuses on evaluating Convolutional Neural Networks (CNN) and Vision Transformers (ViT) for image classification tasks, specifically distinguishing between Asian elephants and African elephants.

Language: Jupyter Notebook - Size: 259 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

evachi27/Automated_Oral_Cancer_Classification

This repository accompanies the article entitled "Automated Classification of Oral Cancer Lesions: Vision Transformer vs Radiomics."

Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

guglielmocamporese/relvit

Official code of "Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer", Guglielmo Camporese, Elena Izzo, Lamberto Ballan. BMVC, 2022.

Language: Python - Size: 55.7 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 21 - Forks: 2

JayaswalVivek/Transformer_For_Image_Classification

Vision Transfomer for classifying images

Language: Jupyter Notebook - Size: 41 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dedeswim/vits-robustness-torch

Code for the paper "A Light Recipe to Train Robust Vision Transformers" [SaTML 2023]

Language: Jupyter Notebook - Size: 348 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 49 - Forks: 2

kayyyywu/cats-and-dogs-classification

Language: Jupyter Notebook - Size: 17 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

all-things-vits/code-samples

Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision.

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 145 - Forks: 9

andreped/INF1600-ai-workshop

🔥 Workshop in AI Deployment (INF-1600, UiT)

Language: Python - Size: 45.9 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 7

sayakpaul/cait-tf

Implementation of CaiT models in TensorFlow and ImageNet-1k checkpoints. Includes code for inference and fine-tuning.

Language: Jupyter Notebook - Size: 2.49 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 3

nicholas-dinicola/nanoViT

Implementation of ViT with PyTorch

Language: Python - Size: 229 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

PieroRendina/multidisciplinary-project-2023-INDYcs

Final project of the Multidisciplinary course offered at Politecnico di Milano A.Y. 2022/2023

Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

Osamah-ElRadaideh/okr

Simple python package containing backbone architectures used in various computer vision tasks

Language: Python - Size: 51.8 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MohsenAmiri79/PASTormer

An image restoration framework (Image Deraining code has been implemented) based on the Restormer model as a back-bone. This is an early idea in my "Attending to the past" research project. This model with roughly the same amount of learnable parameters shows better performance under the same training methods

Language: Python - Size: 27.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0