GitHub topics: dedup
markusressel/py-image-dedup
CLI utility to find near duplicate images and remove all but the best copy.
Language: Python - Size: 17.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 162 - Forks: 18

laktak/chkbit
Check your files for data corruption and run quick file deduplication
Language: Go - Size: 4.4 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 136 - Forks: 8

glehmann/hld
Hard Link Deduplicator
Language: Rust - Size: 309 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 1

harshasrisri/dedup
Remove local files that are duplicates of files in another path
Language: Rust - Size: 101 KB - Last synced at: 7 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

JumperBot/whitespace-sifter
Sift duplicate whitespaces away!
Language: Rust - Size: 1.85 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

veqryn/slog-dedup
Golang structured logging (slog) deduplication and sorting for use with json logging
Language: Go - Size: 106 KB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 16 - Forks: 0

Zygo/bees
Best-Effort Extent-Same, a btrfs dedupe agent
Language: C++ - Size: 1.34 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 739 - Forks: 57

xyb/chunksum
Print FastCDC rolling hash chunks and checksums.
Language: Python - Size: 50.8 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

lkarlslund/stringdedup
String deduplication package for Go
Language: Go - Size: 28.3 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 0

xyb/chunkdup
Find (partial content) duplicate files.
Language: Python - Size: 89.8 KB - Last synced at: about 15 hours ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

chucheng92/HadoopDedup
:watermelon:基于Hadoop和HBase的大规模海量数据去重
Language: Java - Size: 12 MB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 29 - Forks: 16

hekmon/deduper
Analyse 2 paths to found identical files and hard link them to save space
Language: Go - Size: 151 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

jamjamjon/ilytix
A CLI tool for images analysis: checking image integrity, images deduplication, image retrieval.
Language: Rust - Size: 49.8 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

helloall1900/vhash
A C++ reimplementation of Near Duplicate Video Detection - Get a 64-bit comparable hash-value for any video (Video Hash).
Language: C++ - Size: 3.23 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 2

horgh/dupefile
Detect and optionally delete duplicate files in a directory tree
Language: Go - Size: 21.5 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

adlibre/adlibre-backup
High performance rsync backup utilising BTRFS / ZFS filesystem features
Language: Shell - Size: 127 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 36 - Forks: 10

carlinhosfranco/BenSP-Suite
BenSP is a suite of parameterizable benchmarks for stream parallelism which is used to evaluate stream processing characteristics.
Language: C - Size: 31.4 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

EastTower16/LLMDataDistill
distill large scale web page text
Language: C++ - Size: 1.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 1

go-utils/dedupe
Easy Deduplication
Language: Go - Size: 29.3 KB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

eminence/deduprs
Hardlink deduplication tool for Linux
Language: Rust - Size: 14.6 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

prebuilder/rdfind.py
A python wrapper to rdfind
Language: Python - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ParaGroup/p3arsec
Parallel Patterns Implementation of PARSEC Benchmark Applications
Language: C++ - Size: 1.61 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 7

yugn/yadupe
Yet another tool to find and remove duplicate files.
Language: Python - Size: 735 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

rongrimes/zipfile-dedup
Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.
Language: Python - Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 1

uicoolcn/UiCoolVisualWebSpider
📄【优爱酷可视化网站网页数据采集系统】 采用先进的可视化采集技术,智能识别网页元素类型,如:图片、文字、链接、HTML 、文件等,支持运行Javascript脚本、应用正则表达式、自动滚屏、自动翻页、打开弹出窗口并采集数据,支持数据自动去重、仿人工间歇暂停防IP阻塞、自动保存等采集设置;支持浏览器Cookie和缓存等浏览器设置;支持代理轮换科学上网采集;支持“类别/关键字”;支持图像重命名等; 更可支持多线程采集等高级采集选项设置,vip版还可支持定时计划采集。
Size: 6.08 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

dim-geo/analyze-dedup
python script to analyze dedup usage in btrfs
Language: Python - Size: 18.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0
