An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-matching

moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Language: Python - Size: 102 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,618 - Forks: 182

kefilweditse/awesome-matchem-datasets

Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.

Size: 1.72 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Senzing/awesome

Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.

Language: Python - Size: 249 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 59 - Forks: 2

maxharlow/textmatch

๐Ÿ”Ž Finds fuzzy matches between datasets

Language: Python - Size: 120 KB - Last synced at: 20 days ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

lewinfox/levitate

Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).

Language: R - Size: 510 KB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 2

J535D165/recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

Language: Python - Size: 70 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 1,007 - Forks: 156

maxharlow/csvmatch

๐Ÿ”Ž Finds fuzzy matches between CSV files

Language: Python - Size: 158 KB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 188 - Forks: 21

AI-team-UoA/pyJedAI

An open-source library that leverages Pythonโ€™s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

Language: Python - Size: 139 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 76 - Forks: 12

vaneseltine/nominally

A maximum-strength name parser for record linkage.

Language: Python - Size: 1.09 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 37 - Forks: 1

J535D165/data-matching-software

A list of free data matching and record linkage software.

Size: 93.8 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 382 - Forks: 42

vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 153 - Forks: 16

RobinL/fuzzymatcher

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

Language: Python - Size: 848 KB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 283 - Forks: 59

J535D165/recordlinkage-annotator

A browser user interface for manual labeling of record pairs.

Language: JavaScript - Size: 3.49 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 46 - Forks: 8

HPI-Information-Systems/snowman

Welcome to Snowman App โ€“ a Data Matching Benchmark Platform.

Language: TypeScript - Size: 85.8 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 38 - Forks: 2

abcsys/libem

Compound AI toolchain for fast and accurate entity matching, powered by LLMs.

Language: Python - Size: 3.54 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 22 - Forks: 4

ihmeuw/person_linkage_case_study

Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).

Language: HTML - Size: 4.43 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

Wikidata/soweego

Link Wikidata items to large catalogs

Language: Python - Size: 7.87 MB - Last synced at: 30 days ago - Pushed at: 4 months ago - Stars: 96 - Forks: 9

Evnsn/awsome-entity-resolution

A collection of awesome resources regarding Record Linkage.

Size: 13.7 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 7 - Forks: 0

boscoj2008/AdapterEM

AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning

Language: Python - Size: 163 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

pkhaan/AutoCuratedMovieLists

This projects aims to provide lists containing only great movies to users based only a gew filters and search parameters.

Language: Dart - Size: 19 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

ropeladder/record-linkage-resources

Resources for tackling record linkage / deduplication / data matching problems

Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 103 - Forks: 16

AvinashSingh786/WekaComparator

Weka Comparator to match rules to test data with filtering abilites

Language: Java - Size: 682 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

greyhub/job_center

Crawl, matching and explore data about jobs in Viet Nam.

Language: Jupyter Notebook - Size: 117 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

Gust4voSales/proxcluster-deduplicator

ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science

Language: Jupyter Notebook - Size: 9.16 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rohitgarud/asreview-preprocess

An extension for ASReview Lab to preprocess the dataset before importing in ASReview

Language: Python - Size: 393 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

wbsg-uni-mannheim/winter Fork of olehmberg/winter

WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.

Language: Java - Size: 18.6 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

KNehe/musical

A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.

Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

carlosraphael/specification-pattern

https://medium.com/@carlosraphael/specification-design-pattern-in-java-8-bac6f5f943bc

Language: Java - Size: 36.1 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 28 - Forks: 4

lokhande-vishnu/cs838-data-science Fork of saketj/cs838-data-science

Repository for CS 838 (Spring 2017) Data Science project

Language: Jupyter Notebook - Size: 58.8 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

sevetseh28/data-integration-extensible-framework

Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included

Language: HTML - Size: 20.1 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0