GitHub topics: sparse-autoencoder

Repositories

ruizheliUOA/Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs

Size: 63.5 KB - Last synced at: about 19 hours ago - Pushed at: 8 months ago - Stars: 363 - Forks: 25

PaulPauls/llama3_interpretability_sae 📦

A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.

Language: Python - Size: 61.3 MB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 618 - Forks: 38

codelion/pts

Pivotal Token Search

Language: Python - Size: 141 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 108 - Forks: 7

Ki-Seki/Awesome-Transformer-Visualization

Explore visualization tools for understanding Transformer-based large language models (LLMs)

Size: 23.4 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 13 - Forks: 2

vgel/repeng

A library for making RepE control vectors

Language: Jupyter Notebook - Size: 315 KB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 612 - Forks: 48

chrisliu298/awesome-sparse-autoencoders

A resource repository of sparse autoencoders for large language models

Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

maxdreyer/attributing-clip

Repository for "From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance"

Language: Python - Size: 3.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

aarnphm/morph

exploration WYSIWYG editor

Language: TypeScript - Size: 63.6 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

glami/sansa

SANSA - sparse EASE for millions of items

Language: Python - Size: 1.71 MB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 42 - Forks: 6

explanare/ravel

Evaluate interpretability methods on localizing and disentangling concepts in LLMs.

Language: Python - Size: 661 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 47 - Forks: 7

pmcurtin/model-crosscoders

Final project for cs2222

Language: Jupyter Notebook - Size: 33 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zer0int/CLIP-SAE-finetune

Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.

Language: Python - Size: 10.8 MB - Last synced at: 10 days ago - Pushed at: 7 months ago - Stars: 15 - Forks: 4

Butanium/tiny-activation-dashboard

A tiny easily hackable implementation of a feature dashboard.

Language: Jupyter Notebook - Size: 109 KB - Last synced at: 29 days ago - Pushed at: 5 months ago - Stars: 10 - Forks: 2

DrejcPesjak/scaling-monosemanticity-llama

Reproducing Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet using LLaMA. This project explores monosemantic neurons in large language models, implementing and extending methods to scale and analyze interpretability in LLaMA-based architectures.

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

seonglae/emgsd-hermes

Steering GPT2-EMGSD less biased & Generating stereotyped text with vanilla GPT2 without fine tuning or prompt engineering

Language: Jupyter Notebook - Size: 505 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

graphixxxx/Denoising_AutoEncoder

This project integrates Autoencoders, PCA, and CNNs for efficient image processing, combining dimensionality reduction, denoising, and enhanced feature extraction for image analysis and compression.

Size: 1.95 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

cxcscmu/embedding-scope

Interpret and control dense embedding via sparse autoencoder.

Language: Python - Size: 3.52 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

tim-lawson/mlsae

Multi-Layer Sparse Autoencoders (ICLR 2025)

Language: Python - Size: 642 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 17 - Forks: 0

MaheepChaudhary/SAE-Ravel

Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"

Language: Python - Size: 11.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 10 - Forks: 1

snooky23/K-Sparse-AutoEncoder

Sparse Auto Encoder and regular MNIST classification with mini batch's

Language: Jupyter Notebook - Size: 4.62 MB - Last synced at: 12 months ago - Pushed at: over 7 years ago - Stars: 19 - Forks: 9

mrquincle/keras-adversarial-autoencoders

Experiments with Adversarial Autoencoders using Keras

Language: Jupyter Notebook - Size: 6.84 MB - Last synced at: 26 days ago - Pushed at: over 5 years ago - Stars: 22 - Forks: 13

SayanChakraborty126/ML-CODES

This repository contains Python codes for Autoenncoder, Sparse-autoencoder, HMM, Expectation-Maximization, Sum-product Algorithm, ANN, Disparity map, PCA.

Language: Jupyter Notebook - Size: 146 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

wblgers/tensorflow_stacked_denoising_autoencoder

Implementation of the stacked denoising autoencoder in Tensorflow

Language: Python - Size: 11 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 191 - Forks: 85

khoink94/tensorflow-Deep-learning

Tensorflow Examples

Language: Python - Size: 27.2 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 27 - Forks: 11

Specoptor/bot-iot

Implement a sparse autoencoder on the bot-iot dataset for dimensionality reduction followed by computation of reconstruction error, F1 score, recall, accuracy, weights, and threshold amongst other metrics

Language: Jupyter Notebook - Size: 62.3 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0