GitHub topics: data-matching
moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Language: Python - Size: 102 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,618 - Forks: 182

kefilweditse/awesome-matchem-datasets
Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.
Size: 1.72 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Senzing/awesome
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Language: Python - Size: 249 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 59 - Forks: 2

maxharlow/textmatch
๐ Finds fuzzy matches between datasets
Language: Python - Size: 120 KB - Last synced at: 20 days ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

lewinfox/levitate
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
Language: R - Size: 510 KB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 2

J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
Language: Python - Size: 70 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 1,007 - Forks: 156

maxharlow/csvmatch
๐ Finds fuzzy matches between CSV files
Language: Python - Size: 158 KB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 188 - Forks: 21

AI-team-UoA/pyJedAI
An open-source library that leverages Pythonโs data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Language: Python - Size: 139 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 76 - Forks: 12

vaneseltine/nominally
A maximum-strength name parser for record linkage.
Language: Python - Size: 1.09 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 37 - Forks: 1

J535D165/data-matching-software
A list of free data matching and record linkage software.
Size: 93.8 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 382 - Forks: 42

vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 153 - Forks: 16

RobinL/fuzzymatcher
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
Language: Python - Size: 848 KB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 283 - Forks: 59

J535D165/recordlinkage-annotator
A browser user interface for manual labeling of record pairs.
Language: JavaScript - Size: 3.49 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 46 - Forks: 8

HPI-Information-Systems/snowman
Welcome to Snowman App โ a Data Matching Benchmark Platform.
Language: TypeScript - Size: 85.8 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 38 - Forks: 2

abcsys/libem
Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
Language: Python - Size: 3.54 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 22 - Forks: 4

ihmeuw/person_linkage_case_study
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
Language: HTML - Size: 4.43 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

Wikidata/soweego
Link Wikidata items to large catalogs
Language: Python - Size: 7.87 MB - Last synced at: 30 days ago - Pushed at: 4 months ago - Stars: 96 - Forks: 9

Evnsn/awsome-entity-resolution
A collection of awesome resources regarding Record Linkage.
Size: 13.7 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 7 - Forks: 0

boscoj2008/AdapterEM
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning
Language: Python - Size: 163 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

pkhaan/AutoCuratedMovieLists
This projects aims to provide lists containing only great movies to users based only a gew filters and search parameters.
Language: Dart - Size: 19 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

ropeladder/record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 103 - Forks: 16

AvinashSingh786/WekaComparator
Weka Comparator to match rules to test data with filtering abilites
Language: Java - Size: 682 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

greyhub/job_center
Crawl, matching and explore data about jobs in Viet Nam.
Language: Jupyter Notebook - Size: 117 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

Gust4voSales/proxcluster-deduplicator
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
Language: Jupyter Notebook - Size: 9.16 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rohitgarud/asreview-preprocess
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
Language: Python - Size: 393 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

wbsg-uni-mannheim/winter Fork of olehmberg/winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Language: Java - Size: 18.6 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

KNehe/musical
A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.
Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

carlosraphael/specification-pattern
https://medium.com/@carlosraphael/specification-design-pattern-in-java-8-bac6f5f943bc
Language: Java - Size: 36.1 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 28 - Forks: 4

lokhande-vishnu/cs838-data-science Fork of saketj/cs838-data-science
Repository for CS 838 (Spring 2017) Data Science project
Language: Jupyter Notebook - Size: 58.8 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

sevetseh28/data-integration-extensible-framework
Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included
Language: HTML - Size: 20.1 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0
