An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-deduplication"

dpc/rdedup

Data deduplication engine, supporting optional compression and public key encryption.

Language: Rust - Size: 1010 KB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 836 - Forks: 45

sail-sg/sailcraft

🚒 Data Toolkit for Sailor Language Models

Language: Python - Size: 219 KB - Last synced at: 16 days ago - Pushed at: 2 months ago - Stars: 88 - Forks: 10

jchristn/WatsonDedupe

Self-contained C# library for data deduplication using Sqlite

Language: C# - Size: 3.37 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 36 - Forks: 5

Zabuzard/FastCDC4J

Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.

Language: Java - Size: 542 KB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 4

david-siqi-liu/sparklyclean

Optimal distributed data deduplication and supervised learning pipeline using Apache Spark

Language: Scala - Size: 10.1 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 10 - Forks: 0

bmiller1009/deduper

General deduping engine for JDBC sources with output to JDBC/csv targets

Language: Kotlin - Size: 1.23 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

gagan3012/PolyDeDupe

PolyDeDupe: Multi-Lingual Data Deduplication

Language: Python - Size: 161 KB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 2 - Forks: 1

dffdgdg/FindDuplicates

Π­Ρ‚ΠΎΡ‚ ΠΏΡ€ΠΎΠ΅ΠΊΡ‚ прСдставляСт собой ΠΌΠΎΡ‰Π½Ρ‹ΠΉ инструмСнт для поиска ΠΈ Π°Π½Π°Π»ΠΈΠ·Π° Π΄ΡƒΠ±Π»ΠΈΡ€ΡƒΡŽΡ‰ΠΈΡ…ΡΡ Ρ„Π°ΠΉΠ»ΠΎΠ² Π² ΡƒΠΊΠ°Π·Π°Π½Π½ΠΎΠΉ Π΄ΠΈΡ€Π΅ΠΊΡ‚ΠΎΡ€ΠΈΠΈ. ΠŸΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ° позволяСт эффСктивно Π²Ρ‹ΡΠ²Π»ΡΡ‚ΡŒ ΠΎΠ΄ΠΈΠ½Π°ΠΊΠΎΠ²Ρ‹Π΅ Ρ„Π°ΠΉΠ»Ρ‹ Π½Π° основС ΠΈΡ… содСрТимого, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ Ρ…Π΅ΡˆΠΈΡ€ΠΎΠ²Π°Π½ΠΈΡ SHA-256. Она ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°Π΅Ρ‚ настройку ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ², Ρ‚Π°ΠΊΠΈΡ… ΠΊΠ°ΠΊ ΠΌΠΈΠ½ΠΈΠΌΠ°Π»ΡŒΠ½Ρ‹ΠΉ Ρ€Π°Π·ΠΌΠ΅Ρ€ Ρ„Π°ΠΉΠ»Π° для ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠΈ ΠΈ ΠΈΠ³Π½ΠΎΡ€ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½

Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

bevry/fellow

Fellow is a package for creating people that can be unified by their shared values via a singleton list on the class

Language: TypeScript - Size: 2.63 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

Anveshika06/VIT-VTAS-TY-2022 Fork of Arunav07/VIT-VTAS-TY-2022

Size: 17.1 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

shubham-thakare/data-deduplication

A JAVA project that splits data using hashing techniques and removes duplicate blocks to save cloud storage. This project also uses the CloudSim framework for cloud storage simulation.

Language: Java - Size: 640 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

anirudh-69/Financial-Data-ETL-Workflow

ETL workflow for stock data processing using Mage and PostgreSQL

Language: Python - Size: 86.9 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Jim-JMCD/TestFilesMake

A test file creator for testing data storage, compression and transfer. It is a small Linux portable executable to create test data with files filled random selectable printable characters or random binary data. There is a sparse file option. No limit on file size or number. Files are created in a single directory.

Size: 42 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

fabriziosalmi/text-boundaries

A Python-based tool for preprocessing, cleaning, and analyzing text datasets, designed to filter, deduplicate, sort data, and generate statistical insights.

Language: Python - Size: 6.94 MB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 1

KeerthanaPalanikumar/Data-Cleaning-on-SQL

This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.

Size: 5.64 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Jim-JMCD/Data_storage_network_deduplication_calculator

A calculator for storage and transmission of deduplicated data presentation in charts and tables

Size: 176 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

baraverkstad/mixtape

Practical backups. The Unix toolkit way.

Language: Shell - Size: 678 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

therealfun/stcas

Simplest content-addressable storage set of tools to keep space-eficient backups using data deduplication

Last synced at: over 2 years ago - Stars: 0 - Forks: 0