An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: record-linkage

maxharlow/csvmatch

🔎 Finds fuzzy matches between CSV files

Language: Python - Size: 158 KB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 188 - Forks: 21

J535D165/recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

Language: Python - Size: 70 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 1,005 - Forks: 156

moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Language: Python - Size: 101 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1,592 - Forks: 180

dedupeio/dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Language: Python - Size: 5.98 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 4,295 - Forks: 560

openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

Language: C - Size: 36.3 MB - Last synced at: 5 days ago - Pushed at: 27 days ago - Stars: 4,251 - Forks: 433

OlivierBinette/Awesome-Entity-Resolution

List of entity resolution software and resources.

Size: 28.3 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 67 - Forks: 8

spindle-health/carduus

PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.

Language: Python - Size: 1.75 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 14 - Forks: 1

ajl2718/whereabouts

Fast, accurate, open-source geocoding in Python

Language: Python - Size: 7.59 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 35 - Forks: 6

dedupeio/dedupe-examples

:id: Examples for using the dedupe library

Language: Python - Size: 5.12 MB - Last synced at: 1 day ago - Pushed at: 10 months ago - Stars: 412 - Forks: 214

Yomguithereal/talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

Language: JavaScript - Size: 3.39 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 715 - Forks: 47

matchID-project/backend

Backend (Docker & API) for matchID project

Language: Python - Size: 9.99 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 11 - Forks: 14

k3jph/phonics-in-r

Phonetic Spelling Algorithms in R

Language: R - Size: 443 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 8

OlivierBinette/er-evaluation

An End-to-End Evaluation Framework for Entity Resolution Systems

Language: Python - Size: 62.4 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 9

vaneseltine/nominally

A maximum-strength name parser for record linkage.

Language: Python - Size: 1.09 MB - Last synced at: 11 days ago - Pushed at: 21 days ago - Stars: 37 - Forks: 1

ncn-foreigners/blocking

An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.

Language: R - Size: 131 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 11 - Forks: 0

fritshermans/deduplipy

Python package for deduplication/entity resolution using active learning

Language: Python - Size: 521 KB - Last synced at: 11 days ago - Pushed at: 9 months ago - Stars: 79 - Forks: 9

J535D165/data-matching-software

A list of free data matching and record linkage software.

Size: 93.8 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 382 - Forks: 42

ipums/hlink

Hierarchical record linkage at scale

Language: Python - Size: 13.3 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 12 - Forks: 2

vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 153 - Forks: 16

Bergvca/string_grouper

Super Fast String Matching in Python

Language: Python - Size: 2.59 MB - Last synced at: 27 days ago - Pushed at: 2 months ago - Stars: 367 - Forks: 76

NickCrews/mismo

The SQL/Ibis powered sklearn of record linkage

Language: Python - Size: 9.72 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 15 - Forks: 3

Senzing/awesome

Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.

Language: Python - Size: 244 KB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 57 - Forks: 2

PatentsView/PatentsView-Evaluation 📦

Evaluation and benchmarking of PatentsView disambiguation algorithms

Language: Python - Size: 156 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 8

ADBond/splinkclickhouse

Allows Clickhouse to be used as the execution engine for Splink

Language: Python - Size: 959 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

dell-research-harvard/linktransformer

A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

Language: Python - Size: 1.81 MB - Last synced at: 10 days ago - Pushed at: about 2 months ago - Stars: 118 - Forks: 11

sssairohit/enm

Excel Name Matching is a Python-based automation tool that standardizes names in an Excel file using fuzzy matching techniques. It ensures consistency for data processing, making it easier to use VLOOKUP and other operations.

Language: Python - Size: 35.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ul-mds/gecko

Python library for the generation and mutation of realistic personal identification data at scale

Language: Python - Size: 5.51 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 6 - Forks: 1

J535D165/recordlinkage-annotator

A browser user interface for manual labeling of record pairs.

Language: JavaScript - Size: 3.49 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 46 - Forks: 8

dedupeio/csvdedupe

:id: Command line tool for deduplicating CSV files

Language: Python - Size: 1.12 MB - Last synced at: about 22 hours ago - Pushed at: about 5 years ago - Stars: 420 - Forks: 83

usc-isi-i2/rltk

Record Linkage ToolKit (Find and link entities)

Language: Python - Size: 9.59 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 110 - Forks: 23

zouzias/spark-lucenerdd

Spark RDD with Lucene's query and entity linkage capabilities

Language: Scala - Size: 11.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 125 - Forks: 36

dice-group/LIMES

Link Discovery Framework for Metric Spaces.

Language: JavaScript - Size: 38.4 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 130 - Forks: 54

data61/blocklib

Python implementations of record linkage blocking techniques.

Language: Python - Size: 1.13 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 4

maxharlow/textmatch

🔎 Finds fuzzy matches between datasets

Language: Python - Size: 120 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 12 - Forks: 0

ihmeuw/person_linkage_case_study

Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).

Language: HTML - Size: 4.43 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

krokane/movie_sites_entity_linking

Entity resolution project linking common movies between IMDb and Rotten Tomatoes using blocking, string similarity functions, and record linkage techniques. After finding common entities, created a knowledge graph to visualize the dataset using a schema ontology.

Language: Jupyter Notebook - Size: 26.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Wikidata/soweego

Link Wikidata items to large catalogs

Language: Python - Size: 7.87 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 96 - Forks: 9

Xhst/data-engineering-projects

Projects for the course Data Engineering held by professor Paolo Merialdo at Roma Tre University.

Language: Python - Size: 114 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

data61/clkhash

CLK hash: hash pii for entity matching

Language: Python - Size: 3.49 MB - Last synced at: 7 days ago - Pushed at: 13 days ago - Stars: 47 - Forks: 9

ing-bank/spark-matcher

Record matching and entity resolution at scale in Spark

Language: Python - Size: 579 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 34 - Forks: 8

ErcinDedeoglu/Postalized

The ultimate address parsing tool. Effortlessly parse and expand postal data with our cutting-edge technology. Simplify your mailing, enhance accuracy, and embrace the future of postal efficiency. Get Postalized—where precision meets convenience.

Language: C - Size: 5.98 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

Evnsn/awsome-entity-resolution

A collection of awesome resources regarding Record Linkage.

Size: 13.7 KB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 7 - Forks: 0

iesl/stance

Learned string similarity for entity names using optimal transport.

Language: Python - Size: 71.3 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 35 - Forks: 3

ngmarchant/comparator

Similarity and distance measures for clustering and record linkage applications in R

Language: R - Size: 275 KB - Last synced at: 11 days ago - Pushed at: about 3 years ago - Stars: 18 - Forks: 0

cjerzak/LinkOrgs-software

LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn

Language: R - Size: 90.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 11 - Forks: 1

dobraczka/klinker

🧱 blocking methods for entity resolution

Language: Python - Size: 1.19 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 6 - Forks: 0

NHSDigital/mps_diagnostics

Interpretable metadata for the results of NHS England record linkage

Language: Python - Size: 537 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

tteofili/certa

CERTA - Computing Entity Resolution explanations with TriAngles

Language: Python - Size: 26.8 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 5 - Forks: 3

dobraczka/eche

🕸️ Little helper for handling entity clusters

Language: Python - Size: 95.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

cleanzr/clevr

Clustering and Link Prediction Evaluation in R

Language: R - Size: 114 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 3

Felipecastanog/final_project_ENEL645

PRIVACY-PRESERVING RECORD LINKAGE METHODS FOR HOMELESSNESS DATA

Language: Jupyter Notebook - Size: 4.32 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

data61/anonlink

Python implementation of anonymous linkage using cryptographic linkage keys

Language: Python - Size: 3.19 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 65 - Forks: 8

ul-mds/pprl

Collection of software packages for performing privacy-preserving record linkage based on Bloom filters

Language: Python - Size: 332 KB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

t2solve/recordlinkagenet

library for dataset comparison

Language: C# - Size: 279 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

J535D165/FEBRL-fork-v0.4.2

Fork of the Freely Extensible Biomedical Record Linkage program

Language: Python - Size: 6.36 MB - Last synced at: about 2 months ago - Pushed at: over 8 years ago - Stars: 24 - Forks: 21

data61/anonlink-entity-service

Privacy Preserving Record Linkage Service

Language: Python - Size: 12.2 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 26 - Forks: 8

ngmarchant/oasis

A Python package for efficient evaluation based on OASIS (Optimal Asymptotic Sequential Importance Sampling).

Language: Python - Size: 16.3 MB - Last synced at: 22 days ago - Pushed at: almost 4 years ago - Stars: 15 - Forks: 3

moj-analytical-services/splink_graph 📦

pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)

Language: HTML - Size: 2.71 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 3

ul-mds/gecko-examples

Example scripts for generating data with Gecko

Language: Python - Size: 36.1 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

joshuacortez/data-matching-workflow

A workflow template for deduplication and record linkage using the Dedupe library

Language: Jupyter Notebook - Size: 3.47 MB - Last synced at: 10 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

chansooligans/oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches

Language: Python - Size: 1.64 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

data61/anonlink-client

Language: Python - Size: 3.67 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

jimbrig/lossrunAnalyzer 📦

R Package and Shiny App to Analyze Insurance Lossruns

Language: R - Size: 11.7 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

cleanzr/dblink

Distributed Bayesian Entity Resolution in Apache Spark

Language: Scala - Size: 455 KB - Last synced at: 9 months ago - Pushed at: almost 4 years ago - Stars: 57 - Forks: 9

tteofili/cheapER

Low Cost Entity Resolution with Transformers

Language: Jupyter Notebook - Size: 10.6 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

thomaswyrick/duplicate-data-generator

A Python script for generating duplicate data to test the performance of record linkage and master data management systems.

Language: Python - Size: 12.1 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 6 - Forks: 2

foxcroftjn/PAKDD-Class-Ratio

Supplementary code for "Class ratio and its implications for reproducibility and performance in record linkage" presented at The Pacific-Asia Conference on Knowledge Discovery and Data Mining 2024.

Language: Jupyter Notebook - Size: 34.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

catalyst-cooperative/ccai-entity-matching 📦

An exploration of generalizable approaches to unsupervised entity matching for use in linking tabular public energy data sources.

Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

ul-mds/gecko-data

Example data sources as a starting point for working with Gecko

Language: Jupyter Notebook - Size: 4.66 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

ikatic/StringMetrics

The StringMetrics project implements 7 string metric algorithms: Hamming, Dice, Jaro, Jaro-Winkler, Soundex, Levenshtein, and Damerau-Levenshtein. Metrics compare strings using IMetric interface providing an approximate similarity score from 0 (no match) to 1 (exact match) useful in data cleansing, record linkage, NLP, fraud detection, etc.

Language: C# - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

gpoulter/pydedupe 📦

(Archived) A Python library for record linkage and deduplication.

Language: Python - Size: 1.36 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 19 - Forks: 2

zzachw/MedLink

KDD'23 | MedLink: De-Identified Patient Health Record Linkage

Language: Jupyter Notebook - Size: 321 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

andreac0/BeRTo-RecordLinkageTool

Python-based tool to link legal entity datasets when no common ID is available, using name and address information

Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

DecioXXIV/ID-hw6-DataIntegration Fork of AlessandroPesare/Progetto_Finale_IDD

Repository per HW6, Corso di Ingegneria dei Dati 2023/24

Language: Jupyter Notebook - Size: 46.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ropeladder/record-linkage-resources

Resources for tackling record linkage / deduplication / data matching problems

Size: 26.4 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 103 - Forks: 16

fgregg/smered

Mirror of https://bitbucket.org/resteorts/smered

Language: Java - Size: 4.48 MB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 5 - Forks: 0

UltraArceus3/AttributeSelectionAlgorithm

This project is a algorithm that helps the users to find out attributes that are good for performing record linkage.

Language: C++ - Size: 24.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

ufbmi/onefl-deduper

Tools for EHR patient de-duplication (aka entity resolution)

Language: Python - Size: 19.4 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 4

john-thuo1/RecordLinkage

Brief Overview of record linkage implementation

Language: Jupyter Notebook - Size: 155 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

tteofili/er-utils

utilities for working with Entity Resolution models

Language: Python - Size: 35.2 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

ziqizhang/scholarlydata

Experimental code for author name and affiliation linking/disabmiguation

Language: Java - Size: 148 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 2

KirovVerst/qlink

Entity Resolution and Record Linkage library

Language: Python - Size: 4.84 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 0

OlivierBinette/groupbyrule

Deduplicate data using fuzzy and deterministic matching rules.

Language: Python - Size: 11.9 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 0

cleanzr/dblinkR

An R interface for the dblink Spark application

Language: R - Size: 19 MB - Last synced at: 9 months ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

cleanzr/representr

Create representative records post-record linkage

Language: R - Size: 1.02 MB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

ae3000/matchain

Record linkage - simple, flexible, efficient.

Language: Python - Size: 3.96 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

CangyuanLi/floof

Fuzzymatching made easy

Language: Rust - Size: 359 KB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

mgranchelli/ingegneria-dei-dati-2022-23

Homework of 2022-2023 Ingegneria dei dati course at Roma Tre University.

Language: Jupyter Notebook - Size: 32.1 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

RecordLinkageIG/RecordLinkageIG.github.io

Blog of the American Statistical Association's Record Linkage Interest Group.

Language: HTML - Size: 6.24 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

adityarbhat/Data-Challenge-Projects

Contains solution notebooks of attempted data challenges

Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

a-wars/AGIW_DeepER

Implementation of DeepER system (record linkage with neural networks)

Language: Jupyter Notebook - Size: 30.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

cleanzr/fasthash

Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).

Language: Python - Size: 1.6 MB - Last synced at: 9 months ago - Pushed at: over 6 years ago - Stars: 14 - Forks: 3

coletl/geocode

A short guide to approximate geocoding

Language: HTML - Size: 178 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

dtmlinh/Food-Inspections-PostgreSQL

A database management system for restaurant inspection records, restaurant-related tweets, and other relevant data.

Language: Python - Size: 9.55 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

saifmahamood/sortableChallenge

My entry to a data analysis / record linkage coding challenge

Language: Python - Size: 427 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

gcgbarbosa/rl-accuracy

A simple software that generates features and assess the accuracy of record linkage.

Language: Jupyter Notebook - Size: 59.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

purple29th/purpledproject

A META (FACEBOOK) PROJECT - Purpled allows artist to distribute content and monetize artistry. Contribute to the success of both new and experienced artists. Every like, play, remark, and repost reverberates, establishing a creator's reputation, motivating them, and expanding their reach making you always have the great music at your fingertips.

Language: Java - Size: 1.59 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

DForshner/RecordLinkagePipelineDemo

Exploring linking records from disparate data sources

Language: C# - Size: 980 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

magabrielaa/computer-science-applications

Range of computer science applications using Python.

Language: Python - Size: 1.43 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

fsentin/anon-reclinkage

K-Anonymization & Record-linkage Attack

Language: Jupyter Notebook - Size: 1.55 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Related Keywords
record-linkage 118 entity-resolution 58 deduplication 34 python 27 machine-learning 17 data-matching 14 fuzzy-matching 14 data-science 12 spark 9 entity-matching 8 dedupe 7 pandas 6 natural-language-processing 5 clustering 5 string-matching 5 r-package 5 data-integration 5 privacy-preserving-record-linkage 4 disambiguation 4 entity-linking 4 nlp 4 awesome 4 privacy-enhancing-technologies 4 blocking 3 awesome-list 3 pyspark 3 string-similarity 3 lucene 3 numpy 3 embeddings 3 data-linkage 3 duckdb 3 deep-learning 3 data-cleaning 3 string-distance 3 similarity 3 python-library 3 bibliometrics 2 master-data-management 2 covid-19 2 covid19-data 2 covid19-tracker 2 data-management 2 science-research 2 scientometrics 2 privacy 2 igraph 2 sql 2 entities 2 distance-measures 2 linkage 2 schema-matching 2 postgresql 2 xpath 2 splink 2 sentence-transformers 2 hashing 2 knowledge-graph 2 wikidata 2 rust 2 csharp 2 cryptography 2 de-duplicating 2 address 2 address-parser 2 privacy-preserving 2 deduping 2 international 2 identity-resolution 2 data-analysis 2 geocoding 2 elasticsearch 2 link-discovery 2 matching 2 mcmc 2 linked-data 2 duplicate-detection 2 bayesian-inference 2 text-processing 2 phonics 2 phonetic-spelling-algorithms 2 linguistics 2 r 2 networkx 2 downstream-tasks 1 levenshtein-distance 1 ai 1 linking 1 levenshtein 1 indexing 1 post-linkage-analysis 1 faker 1 attribute-selection 1 instance-matching 1 edit-distance 1 schema-alignment 1 graph 1 graph-algorithms 1 pandas-udf 1 active-learning 1