An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: dedup

markusressel/py-image-dedup

CLI utility to find near duplicate images and remove all but the best copy.

Language: Python - Size: 17.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 162 - Forks: 18

laktak/chkbit

Check your files for data corruption and run quick file deduplication

Language: Go - Size: 4.4 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 136 - Forks: 8

glehmann/hld

Hard Link Deduplicator

Language: Rust - Size: 309 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 1

harshasrisri/dedup

Remove local files that are duplicates of files in another path

Language: Rust - Size: 101 KB - Last synced at: 7 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

JumperBot/whitespace-sifter

Sift duplicate whitespaces away!

Language: Rust - Size: 1.85 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

veqryn/slog-dedup

Golang structured logging (slog) deduplication and sorting for use with json logging

Language: Go - Size: 106 KB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 16 - Forks: 0

Zygo/bees

Best-Effort Extent-Same, a btrfs dedupe agent

Language: C++ - Size: 1.34 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 739 - Forks: 57

xyb/chunksum

Print FastCDC rolling hash chunks and checksums.

Language: Python - Size: 50.8 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

lkarlslund/stringdedup

String deduplication package for Go

Language: Go - Size: 28.3 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 0

xyb/chunkdup

Find (partial content) duplicate files.

Language: Python - Size: 89.8 KB - Last synced at: about 15 hours ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

chucheng92/HadoopDedup

:watermelon:基于Hadoop和HBase的大规模海量数据去重

Language: Java - Size: 12 MB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 29 - Forks: 16

hekmon/deduper

Analyse 2 paths to found identical files and hard link them to save space

Language: Go - Size: 151 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

jamjamjon/ilytix

A CLI tool for images analysis: checking image integrity, images deduplication, image retrieval.

Language: Rust - Size: 49.8 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

helloall1900/vhash

A C++ reimplementation of Near Duplicate Video Detection - Get a 64-bit comparable hash-value for any video (Video Hash).

Language: C++ - Size: 3.23 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 2

horgh/dupefile

Detect and optionally delete duplicate files in a directory tree

Language: Go - Size: 21.5 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

adlibre/adlibre-backup

High performance rsync backup utilising BTRFS / ZFS filesystem features

Language: Shell - Size: 127 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 36 - Forks: 10

carlinhosfranco/BenSP-Suite

BenSP is a suite of parameterizable benchmarks for stream parallelism which is used to evaluate stream processing characteristics.

Language: C - Size: 31.4 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

EastTower16/LLMDataDistill

distill large scale web page text

Language: C++ - Size: 1.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 1

go-utils/dedupe

Easy Deduplication

Language: Go - Size: 29.3 KB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

eminence/deduprs

Hardlink deduplication tool for Linux

Language: Rust - Size: 14.6 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

prebuilder/rdfind.py

A python wrapper to rdfind

Language: Python - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ParaGroup/p3arsec

Parallel Patterns Implementation of PARSEC Benchmark Applications

Language: C++ - Size: 1.61 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 7

yugn/yadupe

Yet another tool to find and remove duplicate files.

Language: Python - Size: 735 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

rongrimes/zipfile-dedup

Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.

Language: Python - Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 1

uicoolcn/UiCoolVisualWebSpider

📄【优爱酷可视化网站网页数据采集系统】 采用先进的可视化采集技术,智能识别网页元素类型,如:图片、文字、链接、HTML 、文件等,支持运行Javascript脚本、应用正则表达式、自动滚屏、自动翻页、打开弹出窗口并采集数据,支持数据自动去重、仿人工间歇暂停防IP阻塞、自动保存等采集设置;支持浏览器Cookie和缓存等浏览器设置;支持代理轮换科学上网采集;支持“类别/关键字”;支持图像重命名等; 更可支持多线程采集等高级采集选项设置,vip版还可支持定时计划采集。

Size: 6.08 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

dim-geo/analyze-dedup

python script to analyze dedup usage in btrfs

Language: Python - Size: 18.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0