GitHub topics: vision-transformer

Repositories

qubvel/transformers-notebooks

Inference and fine-tuning examples for vision models from 🤗 Transformers

Language: Jupyter Notebook - Size: 33.1 MB - Last synced at: about 5 hours ago - Pushed at: about 18 hours ago - Stars: 76 - Forks: 13

MaxwellYaoNi/PACE

[NeurIPS 2024 Spotlight] Official implementation for "PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization"

Language: Python - Size: 172 KB - Last synced at: about 6 hours ago - Pushed at: about 19 hours ago - Stars: 10 - Forks: 0

tue-mps/eomt

[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).

Language: Jupyter Notebook - Size: 5.29 MB - Last synced at: about 7 hours ago - Pushed at: about 21 hours ago - Stars: 111 - Forks: 5

huawei-noah/Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

Language: Python - Size: 98.4 MB - Last synced at: about 22 hours ago - Pushed at: about 1 month ago - Stars: 4,184 - Forks: 718

PRITHIVSAKTHIUR/Fashion-Product-Usage

Fashion-Product-Usage is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies fashion product images based on their intended usage context.

Language: Python - Size: 11.7 KB - Last synced at: about 14 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Anne-Andresen/3D-Vision-transformer

Self configuring and adapting vision transformer for segmentation of 3d images

Language: Python - Size: 4.09 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

yutingshih/vit-quant

Quantization for vision transformers

Language: Python - Size: 17.6 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

cmhungsteve/Awesome-Transformer-Attention

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

Size: 5.65 MB - Last synced at: about 1 hour ago - Pushed at: 9 months ago - Stars: 4,836 - Forks: 492

mahmoodlab/HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)

Language: Jupyter Notebook - Size: 740 MB - Last synced at: about 22 hours ago - Pushed at: about 1 year ago - Stars: 555 - Forks: 96

deepglint/unicom

Large-Scale Visual Representation Model

Language: Python - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 659 - Forks: 27

PRITHIVSAKTHIUR/Multilabel-Portrait-SigLIP2

Multilabel-Portrait-SigLIP2 is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies portrait-style images into one of the following visual portrait categories:

Language: Python - Size: 10.7 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

zhongkaifu/Seq2SeqSharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

Language: C# - Size: 432 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 203 - Forks: 42

lukas-blecher/LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Language: Python - Size: 9.06 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 14,100 - Forks: 1,119

Rajadhopiya/Gender-Classifier-Mini

Gender-Classifier-Mini is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images based on gender using the SiglipForImageClassification architecture.

Language: Python - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

open-mmlab/mmdetection

OpenMMLab Detection Toolbox and Benchmark

Language: Python - Size: 61.8 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 30,795 - Forks: 9,627

MrAlonso9/Hand-Gesture-2-Robot

Hand-Gesture-2-Robot is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the SiglipForImageClassification architecture.

Language: Python - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Sanyi54/Clipart-126-DomainNet

Clipart-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the SiglipForImageClassification architecture

Language: Python - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

hyuki875/Transformers

The Transformers repository provides a comprehensive implementation of the Transformer architecture, a groundbreaking model that has revolutionized both Natural Language Processing (NLP) and Computer Vision tasks. Introduced in the seminal paper "Attention is All You Need" by Vaswani et al.

Size: 1.95 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

microsoft/Cream

This is a collection of our NAS and Vision Transformer work.

Language: Python - Size: 8.52 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 1,741 - Forks: 237

cuixing158/Awesome-CV-MasterHub

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 214 - Forks: 12

ViTAE-Transformer/Remote-Sensing-RVSA

The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"

Language: Python - Size: 4.56 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 441 - Forks: 38

HMUNACHI/VisionArchitectures

Comparative Analysis of SOTA Vision Architectures; VGG, GoogleNet, ResNet & Vision Transformers.

Language: Jupyter Notebook - Size: 2.29 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

OpenGVLab/PIIP

[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)

Language: Python - Size: 11 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 87 - Forks: 2

PRITHIVSAKTHIUR/Geometric-Shapes-Classification

Geometric-Shapes-Classification is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a multi-class shape recognition task. It classifies various geometric shapes using the SiglipForImageClassification architecture.

Language: Python - Size: 0 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving

[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving

Size: 15 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 282 - Forks: 12

NVlabs/FasterViT

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention

Language: Python - Size: 1.21 MB - Last synced at: 1 day ago - Pushed at: 29 days ago - Stars: 843 - Forks: 68

NielsRogge/Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Language: Jupyter Notebook - Size: 224 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 10,576 - Forks: 1,578

thu-ml/SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Language: Cuda - Size: 55.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 435 - Forks: 27

pprp/awesome-attention-mechanism-in-cv

Awesome List of Attention Modules and Plug&Play Modules in Computer Vision

Language: Python - Size: 3.25 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 1,168 - Forks: 170

SHI-Labs/VMFormer

[Preprint] VMFormer: End-to-End Video Matting with Transformer

Language: Python - Size: 2.69 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 114 - Forks: 9

JingyunLiang/SwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

Language: Python - Size: 29.1 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 4,772 - Forks: 574

ViTAE-Transformer/ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"

Language: Python - Size: 10.5 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 1,569 - Forks: 205

Blaizzy/mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

Language: Python - Size: 31.3 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 1,155 - Forks: 109

Haiyang-W/GiT

[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Language: Python - Size: 12.5 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 345 - Forks: 15

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

Language: Python - Size: 13.5 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 3,620 - Forks: 1,081

AImageLab-zip/MONKEY_challenge_ziplab

AImageLab Zip UNIMORE solution for the MONKEY Challenge of Radboud University Medical Center

Language: Jupyter Notebook - Size: 295 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

Adlith/MoE-Jetpack

[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Language: Python - Size: 32.3 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 115 - Forks: 1

zer0int/CLIP-XAI-GUI

CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models

Language: Python - Size: 3.46 MB - Last synced at: about 11 hours ago - Pushed at: 7 months ago - Stars: 20 - Forks: 1

mashaan14/YouTube-channel

Code I used for my YouTube videos

Language: Jupyter Notebook - Size: 1.36 GB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 6 - Forks: 2

lxa9867/ImageFolder

High-performance Image Tokenizers for VAR and AR

Language: Python - Size: 3.97 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 239 - Forks: 5

baaivision/EVA

EVA Series: Visual Representation Fantasies from BAAI

Language: Python - Size: 8.61 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 2,467 - Forks: 184

InternLM/InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Language: Python - Size: 199 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 2,805 - Forks: 171

hustvl/YOLOS

[NeurIPS 2021] You Only Look at One Sequence

Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 862 - Forks: 122

SunzeY/AlphaCLIP

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Language: Jupyter Notebook - Size: 173 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 802 - Forks: 55

ViTAE-Transformer/P3M-Net

The official repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving"

Language: Python - Size: 56.2 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 104 - Forks: 9

SalvatoreRa/tutorial

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)

Language: Jupyter Notebook - Size: 180 MB - Last synced at: 6 days ago - Pushed at: 18 days ago - Stars: 184 - Forks: 29

FoundationVision/VAR

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language: Jupyter Notebook - Size: 620 KB - Last synced at: 9 days ago - Pushed at: 28 days ago - Stars: 7,408 - Forks: 463

JersonGB22/ComputerVision

Repository of Computer Vision models based on CNNs, Vision Transformers, and YOLO11, implemented with TensorFlow, PyTorch, Hugging Face, and Ultralytics.

Size: 6.56 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0

google-research/scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

Language: Python - Size: 63.7 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 3,495 - Forks: 453

Westlake-AI/openmixup

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

Language: Python - Size: 3.68 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 646 - Forks: 59

kyegomez/Vit-RGTS

Open source implementation of "Vision Transformers Need Registers"

Language: Python - Size: 211 KB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 172 - Forks: 15

topazape/ViT-Pytorch

Vision Transformer in Pytorch

Language: Python - Size: 4.62 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

NVlabs/VoxFormer

Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]

Language: Python - Size: 53.4 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 1,106 - Forks: 90

uncbiag/Awesome-Foundation-Models

A curated list of foundation models for vision and language tasks

Size: 296 KB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 974 - Forks: 46

ziqipang/LM4VisualEncoding

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

Language: Python - Size: 538 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 234 - Forks: 8

ShoufaChen/AdaptFormer

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

Language: Python - Size: 2.67 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 352 - Forks: 21

srinadh99/AstroFormer

Photometry Guided Cross Attention Transformers for Astronomical Image Processing

Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

Language: Python - Size: 592 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 6,463 - Forks: 524

Polymath-Saksh/deepfake

DeepFake Detection Tool, hosted on a Django Framework with a Self-Trained Vision Transformers model.

Language: Python - Size: 85.6 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 2 - Forks: 1

mit-han-lab/efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Language: Python - Size: 207 MB - Last synced at: 9 days ago - Pushed at: 16 days ago - Stars: 2,790 - Forks: 215

naver/unic

PyTorch code and pretrained weights for the UNIC models.

Language: Python - Size: 367 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 29 - Forks: 1

czczup/ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Language: Python - Size: 1.78 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 1,349 - Forks: 143

towhee-io/towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Language: Python - Size: 37.2 MB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 3,346 - Forks: 258

marqo-ai/marqo-FashionCLIP

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

Language: Python - Size: 11.2 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 84 - Forks: 8

ViTAE-Transformer/ViTAE-Transformer

The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"

Language: Python - Size: 22.2 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 269 - Forks: 29

sftwre/TrafficTransformer

ViT model that predicts vehicle-collision time on a video stream from a Nexar dashcam

Language: Python - Size: 189 KB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

miniHuiHui/awesome-high-order-neural-network

Size: 43.9 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 46 - Forks: 4

Oneflow-Inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Language: Python - Size: 34.6 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 401 - Forks: 56

JingyunLiang/RVRT

Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurlPS2022, official repository)

Language: Python - Size: 2.81 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 374 - Forks: 36

hila-chefer/Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Language: Jupyter Notebook - Size: 3.76 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 1,867 - Forks: 248

PrivateGER/clipsight

Search through your image collection using natural language descriptions or similar image queries.

Language: Python - Size: 145 KB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

PRITHIVSAKTHIUR/Food-101-93M

Food-101-93M is a fine-tuned image classification model built on top of google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It is trained to classify food images into one of 101 popular dishes, derived from the Food-101 dataset.

Language: Python - Size: 0 Bytes - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

aehrc/cvt2distilgpt2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Language: Python - Size: 93.5 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 67 - Forks: 7

alohays/awesome-visual-representation-learning-with-transformers

Awesome Transformers (self-attention) in Computer Vision

Size: 73.2 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 270 - Forks: 38

JingyunLiang/VRT

VRT: A Video Restoration Transformer (official repository)

Language: Python - Size: 12.7 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 1,429 - Forks: 135

ViTAE-Transformer/ViTDet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

Language: Python - Size: 8.29 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 559 - Forks: 46

ViTAE-Transformer/MTP

The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"

Language: Python - Size: 18 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 215 - Forks: 11

Little-Podi/GRM

[CVPR'23] The official PyTorch implementation of our CVPR 2023 paper: "Generalized Relation Modeling for Transformer Tracking".

Language: Python - Size: 660 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 75 - Forks: 8

PRITHIVSAKTHIUR/Mirage-Photo-Classifier

Mirage-Photo-Classifier is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a binary image authenticity classification task. It is designed to determine whether an image is real or AI-generated (fake) using the SiglipForImageClassification architecture.

Language: Python - Size: 11.7 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

sovit-123/vision_transformers

Vision Transformers for image classification, image segmentation, and object detection.

Language: Python - Size: 44.1 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 49 - Forks: 9

OSU-MLB/ViT_PEFT_Vision

[CVPR'25] Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition

Language: Jupyter Notebook - Size: 3.54 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 31 - Forks: 0

omerbt/Splice

Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)

Language: Jupyter Notebook - Size: 206 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 32

hustvl/MIMDet

[ICCV 2023] You Only Look at One Partial Sequence

Language: Python - Size: 551 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 339 - Forks: 30

junchen14/Multi-Modal-Transformer

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

Size: 354 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 225 - Forks: 31

emcf/thepipe

Extract clean data from anywhere, powered by vision-language models ⚡

Language: Python - Size: 4.12 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 1,248 - Forks: 79

mist-medical/MIST

MIST: A simple and scalable end-to-end framework for 3D medical imaging segmentation.

Language: Python - Size: 1.14 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 43 - Forks: 12

RuoyuChen10/SMDL-Attribution

[ICLR 2024 Oral] Less is More: Fewer Interpretable Region via Submodular Subset Selection

Language: Jupyter Notebook - Size: 28.8 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 78 - Forks: 4

ViTAE-Transformer/APTv2

The official repo for the extension of [NeurIPS'22] "APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking": https://github.com/pandorgan/APT-36K

Language: Python - Size: 9.89 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 19 - Forks: 0

StefanHeng/ECG-Representation-Learning

Self-supervised pre-training for ECG representation with inspiration from transformers & computer vision

Language: Python - Size: 37.7 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 6

XPixelGroup/RethinkVSRAlignment

(NIPS 2022) Rethinking Alignment in Video Super-Resolution Transformers

Language: Python - Size: 6.23 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 125 - Forks: 9

baudm/parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Language: Python - Size: 1.27 MB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 631 - Forks: 135

staghado/vit.cpp

Inference Vision Transformer (ViT) in plain C/C++ with ggml

Language: C++ - Size: 2.17 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 265 - Forks: 21

rentainhe/visualization

a collection of visualization function

Language: Python - Size: 2.82 MB - Last synced at: 14 days ago - Pushed at: over 3 years ago - Stars: 416 - Forks: 40

mv-lab/swin2sr

[ECCV] Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. Advances in Image Manipulation (AIM) workshop ECCV 2022. Try it out! over 3.3M runs https://replicate.com/mv-lab/swin2sr

Language: Python - Size: 20 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 613 - Forks: 72

chanjoong-kim/mobilevit-pytorch-cifar10

A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer" (arXiv, 2021), with custom modifications and trained on the CIFAR-10 dataset.

Language: Python - Size: 15.6 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

AnanthaPadmanaban-KrishnaKumar/CriticalHeads

Identifying essential attention mechanisms in SAM through systematic ablation studies to enable efficient model compression

Size: 0 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Trash-Net

Trash-Net is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images of waste materials into different categories using the SiglipForImageClassification architecture

Language: Python - Size: 12.7 KB - Last synced at: 10 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

Related Keywords

vision-transformer 962 deep-learning 289 pytorch 283 computer-vision 206 transformer 173 image-classification 148 machine-learning 95 vit 84 transformers 75 python 68 cnn 59 self-supervised-learning 55 object-detection 55 attention-mechanism 46 tensorflow 44 classification 44 semantic-segmentation 40 convolutional-neural-networks 35 huggingface-transformers 34 deep-neural-networks 33 transfer-learning 31 imagenet 31 resnet 30 clip 28 attention 28 pytorch-lightning 28 artificial-intelligence 27 vision 24 pytorch-implementation 24 segmentation 24 ai 23 multimodal 22 self-attention 22 swin-transformer 22 huggingface 22 cifar10 20 fine-tuning 19 llm 18 image-processing 18 keras 18 contrastive-learning 17 bert 17 instance-segmentation 17 resnet-50 16 foundation-models 16 efficientnet 16 siglip2 15 vision-language-model 15 remote-sensing 15 neural-network 14 transformer-architecture 14 image-recognition 14 medical-imaging 14 nlp 13 gradio 13 representation-learning 13 large-language-models 13 mae 12 transformer-models 12 masked-autoencoder 12 pretrained-models 12 medical-image-analysis 12 dataset 11 jax 11 image-captioning 11 explainable-ai 11 cnn-classification 11 tensorflow2 11 masked-image-modeling 10 adversarial-attacks 10 google 10 ocr 10 python3 10 neural-networks 10 image-segmentation 9 detection 9 cifar100 9 action-recognition 9 deit 9 generative-adversarial-network 9 detr 9 natural-language-processing 9 streamlit 9 autoencoder 9 xai 9 federated-learning 8 opencv 8 visualization 8 vgg16 8 pose-estimation 8 knowledge-distillation 8 unet 8 deeplearning 8 continual-learning 8 ade20k 8 unsupervised-learning 8 diffusion-models 8 vision-transformer-image-classification 8 machine-learning-algorithms 8 kaggle 8