An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: entity-resolution

Graphlet-AI/eridu

Deep fuzzy matching people and company names for multilingual entity resolution using representation learning

Language: Python - Size: 1010 KB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 17 - Forks: 1

DerwenAI/kleptosyn

Synthetic data generation for investigative graphs based on patterns of bad-actor tradecraft.

Language: Jupyter Notebook - Size: 3.87 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 6 - Forks: 0

dedupeio/dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Language: Python - Size: 5.98 MB - Last synced at: about 8 hours ago - Pushed at: 7 months ago - Stars: 4,323 - Forks: 562

wcmc-its/ReCiter

ReCiter: an enterprise open source author disambiguation system for academic institutions

Language: Java - Size: 236 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 47 - Forks: 24

moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Language: Python - Size: 101 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,629 - Forks: 184

heathersherry/Knowledge-Graph-Tutorials-and-Papers

Insightful Tutorials and Papers about Knowledge Graphs

Size: 4.26 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 902 - Forks: 125

tshu-w/EMBer

Code and data for the paper "Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction" (IJCAI 2022)

Language: Python - Size: 29.8 MB - Last synced at: about 22 hours ago - Pushed at: 4 days ago - Stars: 6 - Forks: 2

zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Language: Java - Size: 679 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,052 - Forks: 128

OlivierBinette/Awesome-Entity-Resolution

List of entity resolution software and resources.

Size: 28.3 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 75 - Forks: 9

mhmoslemi2338/pre-EM-bias

Official implementation of the IEEE Big Data 2024 paper "Evaluating Blocking Biases in Entity Matching"

Language: Jupyter Notebook - Size: 38.1 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

mhmoslemi2338/sigmod-FAIR-EM-post-process

Official implementation of the pre-print paper "Mitigating Matching Biases Through Score Calibration"

Language: Jupyter Notebook - Size: 32.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

mhmoslemi2338/CaliFair-EM

Official implementation of GUIDE-AI @ SIGMOD paper "Threshold-Independent Fair Matching through Score Calibration"

Language: Python - Size: 19.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

mhmoslemi2338/Heterogeneity_EM_Survey

Official implementation of the paper "Heterogeneity in Entity Matching: A Survey and Experimental Analysis"

Language: Jupyter Notebook - Size: 1.43 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

sergiosolorzano/entity_resolution

Entity Resolution projects with Tabular Data: One combines learned representations generated by a siamese-like feed-forward neural network and a clustering algorithm; another combines meta-blocking with a clustering method.

Language: Python - Size: 78.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

cleanzr/clevr

Clustering and Link Prediction Evaluation in R

Language: R - Size: 114 KB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 3

ncn-foreigners/blocking

An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.

Language: R - Size: 137 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 13 - Forks: 0

ThorstenDoherr/searchengine

heuristic matching of large databases by fuzzy criteria like addresses

Language: xBase - Size: 92 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 11 - Forks: 1

dedupeio/csvdedupe

:id: Command line tool for deduplicating CSV files

Language: Python - Size: 1.12 MB - Last synced at: 5 days ago - Pushed at: about 5 years ago - Stars: 423 - Forks: 83

Senzing/awesome

Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.

Language: Python - Size: 249 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 59 - Forks: 2

AI-team-UoA/pyJedAI

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

Language: Python - Size: 139 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 78 - Forks: 12

NickCrews/mismo

The SQL/Ibis powered sklearn of record linkage

Language: Python - Size: 9.72 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 16 - Forks: 3

tshu-w/Uniblocker

Code and data for the paper: Towards Universal Dense Blocking for Entity Resolution

Language: Python - Size: 17.6 MB - Last synced at: about 22 hours ago - Pushed at: 8 months ago - Stars: 7 - Forks: 2

Picovoice/rhino

On-device Speech-to-Intent engine powered by deep learning

Language: Python - Size: 272 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 667 - Forks: 91

OlivierBinette/er-evaluation

An End-to-End Evaluation Framework for Entity Resolution Systems

Language: Python - Size: 62.4 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 29 - Forks: 10

maxharlow/csvmatch

πŸ”Ž Finds fuzzy matches between CSV files

Language: Python - Size: 158 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 189 - Forks: 21

maxharlow/textmatch

πŸ”Ž Finds fuzzy matches between datasets

Language: Python - Size: 131 KB - Last synced at: 3 days ago - Pushed at: 28 days ago - Stars: 13 - Forks: 0

matchID-project/backend

Backend (Docker & API) for matchID project

Language: Python - Size: 10 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 11 - Forks: 14

fritshermans/deduplipy

Python package for deduplication/entity resolution using active learning

Language: Python - Size: 521 KB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 80 - Forks: 9

codeforkjeff/conciliator

OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.

Language: Java - Size: 929 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 119 - Forks: 24

J535D165/recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

Language: Python - Size: 70 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,007 - Forks: 156

dell-research-harvard/linktransformer

A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

Language: Python - Size: 1.81 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 119 - Forks: 10

amazon-science/ReFinED

ReFinED is an efficient and accurate entity linking (EL) system.

Language: Python - Size: 433 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 216 - Forks: 45

JohnSnowLabs/nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

Language: Python - Size: 474 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 915 - Forks: 138

kuzudb/kgc-2025-workshop-high-quality-graphs

Workshop on high-quality knowledge graph creation with Kuzu and Senzing

Language: Python - Size: 15.9 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 19 - Forks: 8

dedupeio/dedupe-examples

:id: Examples for using the dedupe library

Language: Python - Size: 5.12 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 412 - Forks: 214

vaneseltine/nominally

A maximum-strength name parser for record linkage.

Language: Python - Size: 1.09 MB - Last synced at: 3 days ago - Pushed at: 15 days ago - Stars: 37 - Forks: 1

J535D165/data-matching-software

A list of free data matching and record linkage software.

Size: 93.8 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 382 - Forks: 42

kuzudb/nobel-network

Data and code for Nobel Laureate academic genealogy network analysis and entity resolution

Language: Jupyter Notebook - Size: 385 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 6 - Forks: 0

Gaglia88/gsm_repro

Reproducibility experiments for Generalized Supervised Meta-blocking

Language: Python - Size: 60.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 153 - Forks: 16

microsoft/vert-papers

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).

Language: Python - Size: 22 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 278 - Forks: 94

Graphlet-AI/graphlet

PyPi module for Graphlet AI Knowledge Graph Factory

Language: Python - Size: 20.4 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 29 - Forks: 1

rajatasusual/information_extractor

information_extractor is a tool that leverages spaCy for coreference resolution and SpanBERT for relation extraction. This project integrates named entity recognition (NER) with relation extraction to identify and analyze relationships between entities in text.

Language: Python - Size: 169 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tilotech/tilores-langchain

This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.

Language: Python - Size: 35.2 KB - Last synced at: 21 days ago - Pushed at: 7 months ago - Stars: 6 - Forks: 1

PatentsView/PatentsView-Evaluation πŸ“¦

Evaluation and benchmarking of PatentsView disambiguation algorithms

Language: Python - Size: 156 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 8

neo4j-graph-examples/entity-resolution

Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real entity across different digital entities present on same or different data sets. Record linking is necessary when joining different entities which are similar and may or may not share some common identifiers. Neo4j offers various advantages to perform entity resolution / record linking. This repository covers such a use case of linking similar user accounts for analytics and providing better recommendations.

Language: Go - Size: 1.17 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 23 - Forks: 5

ADBond/splinkclickhouse

Allows Clickhouse to be used as the execution engine for Splink

Language: Python - Size: 959 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

snipsco/snips-nlu-parsers

Rust crate for entity parsing

Language: Rust - Size: 128 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 35

J535D165/recordlinkage-annotator

A browser user interface for manual labeling of record pairs.

Language: JavaScript - Size: 3.49 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 46 - Forks: 8

HPI-Information-Systems/snowman

Welcome to Snowman App – a Data Matching Benchmark Platform.

Language: TypeScript - Size: 85.8 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 38 - Forks: 2

abcsys/libem

Compound AI toolchain for fast and accurate entity matching, powered by LLMs.

Language: Python - Size: 3.54 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 22 - Forks: 4

usc-isi-i2/rltk

Record Linkage ToolKit (Find and link entities)

Language: Python - Size: 9.59 MB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 110 - Forks: 23

Gaglia88/sparker

SparkER: an Entity Resolution framework for Apache Spark

Language: Scala - Size: 48.4 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 19

scify/jedai-ui

UI for JedAI Toolkit

Language: Java - Size: 1.09 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 5

scify/JedAIToolkit

An open source, high scalability toolkit in Java for Entity Resolution.

Language: Java - Size: 278 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 218 - Forks: 47

iesl/learned-string-alignments

Learning String Alignments for Entity Aliases

Language: Python - Size: 37.1 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 37 - Forks: 6

ihmeuw/person_linkage_case_study

Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).

Language: HTML - Size: 4.43 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

krokane/movie_sites_entity_linking

Entity resolution project linking common movies between IMDb and Rotten Tomatoes using blocking, string similarity functions, and record linkage techniques. After finding common entities, created a knowledge graph to visualize the dataset using a schema ontology.

Language: Jupyter Notebook - Size: 26.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dobraczka/kiez

🏘️ Hubness reduced nearest neighbor search for entity alignment with knowledge graph embeddings

Language: Python - Size: 903 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 3

Wikidata/soweego

Link Wikidata items to large catalogs

Language: Python - Size: 7.87 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 96 - Forks: 9

izuna385/Entity-Linking-Recent-Trends

Recent trends of Entity Linking, Disambiguation, and Representation.

Size: 787 KB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 345 - Forks: 18

tshu-w/ComEM

Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)

Language: Python - Size: 158 KB - Last synced at: about 22 hours ago - Pushed at: 5 months ago - Stars: 11 - Forks: 2

Xhst/data-engineering-projects

Projects for the course Data Engineering held by professor Paolo Merialdo at Roma Tre University.

Language: Python - Size: 114 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

harpin-ai/toolkit-examples

Examples for trying out the harpin AI identity resolution and data quality toolkit

Language: Jupyter Notebook - Size: 599 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

ing-bank/spark-matcher

Record matching and entity resolution at scale in Spark

Language: Python - Size: 579 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 34 - Forks: 8

vefthym/fairER

FairER: Entity Resolution with Fairness Constraints

Language: Python - Size: 113 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 5

Evnsn/awsome-entity-resolution

A collection of awesome resources regarding Record Linkage.

Size: 13.7 KB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

databricks-industry-solutions/auto-data-linkage

Low effort linking and easy de-duplication. Databricks ARC provides a simple, automated, lakehouse integrated entity resolution solution for intra and inter data linking.

Language: Python - Size: 6.12 MB - Last synced at: 30 days ago - Pushed at: 8 months ago - Stars: 47 - Forks: 22

iesl/stance

Learned string similarity for entity names using optimal transport.

Language: Python - Size: 71.3 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 35 - Forks: 3

ngmarchant/comparator

Similarity and distance measures for clustering and record linkage applications in R

Language: R - Size: 275 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 18 - Forks: 0

eZWALT/ADSDB-DS-EtE-Project

MDS-FIB Algorithms, Data Structures and Databases (ADSDB) Subject 2024-25 Q1, Data-Science End-to-End project path

Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

zentity-io/zentity

Entity resolution for Elasticsearch.

Language: Java - Size: 634 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 158 - Forks: 29

ScaDS/MovieGraphBenchmark

πŸ“½ Benchmark datasets for Entity Resolution on Knowledge Graphs

Language: Python - Size: 7.94 MB - Last synced at: 15 minutes ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

dobraczka/klinker

🧱 blocking methods for entity resolution

Language: Python - Size: 1.19 MB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

dobraczka/sylloge

πŸ—ƒοΈ Small library to simplify collecting and loading of entity alignment benchmark datasets

Language: Python - Size: 281 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

tteofili/certa

CERTA - Computing Entity Resolution explanations with TriAngles

Language: Python - Size: 26.8 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 3

dobraczka/eche

πŸ•ΈοΈ Little helper for handling entity clusters

Language: Python - Size: 95.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

DerwenAI/cdl2024_masterclass

Connected Data London 2024, ERKG masterclass: how to generate knowledge graphs from structured and unstructured data based on entity resolution (ER) to enhance data quality for the downstream AI applications

Size: 81.1 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Graphlet-AI/graphlet-ai.github.io

Web page repository for Graphlet.AI

Language: SCSS - Size: 37.3 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

data61/anonlink

Python implementation of anonymous linkage using cryptographic linkage keys

Language: Python - Size: 3.19 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 65 - Forks: 8

angelo-casciani/Trace_similarity_LLM

Language: Python - Size: 1.51 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

tilotech/python-tilores-sdk

The tilores-sdk Python package is a small SDK to develop with the Tilores entity resolution system.

Language: Python - Size: 40 KB - Last synced at: 17 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

tilotech/langchain-tilores πŸ“¦

This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.

Language: Python - Size: 29.3 KB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 17 - Forks: 1

J535D165/FEBRL-fork-v0.4.2

Fork of the Freely Extensible Biomedical Record Linkage program

Language: Python - Size: 6.36 MB - Last synced at: 3 months ago - Pushed at: over 8 years ago - Stars: 24 - Forks: 21

rosette-api/mock-data

Mock data that is used for unit testing of the Babel Street Analytics bindings

Size: 164 KB - Last synced at: 4 months ago - Pushed at: over 9 years ago - Stars: 0 - Forks: 1

rosette-api/ruby-script

Contains Ruby scripts for accessing Babel Street Analytics

Size: 16.6 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

DerwenAI/strwythura

How to construct knowledge graphs from unstructured data sources

Language: Jupyter Notebook - Size: 1.22 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 67 - Forks: 6

wbsg-uni-mannheim/MatchGPT

This repository contains code and extensive prompt examples to reproduce and extend the experiments in our papers "Using ChatGPT for Entity Matching" and "Entity Matching using Large Language Models".

Language: Jupyter Notebook - Size: 185 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 42 - Forks: 8

ngmarchant/oasis

A Python package for efficient evaluation based on OASIS (Optimal Asymptotic Sequential Importance Sampling).

Language: Python - Size: 16.3 MB - Last synced at: 26 days ago - Pushed at: about 4 years ago - Stars: 15 - Forks: 3

cleanzr/RLdata

Language: R - Size: 4.95 MB - Last synced at: 10 months ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

cleanzr/dblink-experiments

Details for reproducing the experiments in our d-blink paper

Language: R - Size: 31.5 MB - Last synced at: 10 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

pmart123/cymbology

Identifies and validates financial security ids such as Sedol, Cusip, Isin numbers.

Language: Python - Size: 58.6 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 14 - Forks: 1

abcsys/libem-sample-data

Libem sample datasets.

Language: Python - Size: 17.2 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 3 - Forks: 1

chansooligans/oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches

Language: Python - Size: 1.64 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

abcsys/libem-notebook

Libem notebooks.

Language: Jupyter Notebook - Size: 2.32 MB - Last synced at: 10 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

cleanzr/dblink

Distributed Bayesian Entity Resolution in Apache Spark

Language: Scala - Size: 455 KB - Last synced at: 10 months ago - Pushed at: about 4 years ago - Stars: 57 - Forks: 9

tteofili/cheapER

Low Cost Entity Resolution with Transformers

Language: Jupyter Notebook - Size: 10.6 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

AdityaSetyadi/sepakat-integrasi

πŸ₯ˆπŸ† SEPAKAT - Modul Integrasi is a winning project in Regsosek Hackathon 2022 organized by The Ministry of National Development Planning/Bappenas Indonesia. This module provides a single individual identification model by integrating Regsosek data as basic information which is then linked with related data using the idea of entity resolution.

Language: Jupyter Notebook - Size: 22.1 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

aws-solutions-library-samples/guidance-for-patient-entity-resolution-with-aws-healthlake

AWS HealthLake patient matching with AWS Entity Resolution

Language: Python - Size: 543 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

Lefteris-Souflas/Entity-Resolution

Addressed Entity Resolution challenges. Tasks include schema-agnostic blocking, pairwise comparisons, Meta-Blocking graph construction, and Jaccard similarity computation. Deliverables include source code, reports, and reproducibility guidelines in Python

Language: Jupyter Notebook - Size: 4.54 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Related Keywords
entity-resolution 172 record-linkage 62 deduplication 37 python 23 entity-matching 20 data-matching 19 machine-learning 19 fuzzy-matching 14 data-science 12 data-integration 12 blocking 11 knowledge-graph 11 entity-linking 10 nlp 9 dedupe 8 entity-extraction 7 named-entity-recognition 7 spark 7 clustering 7 llm 7 java 6 identity-resolution 6 deep-learning 6 elasticsearch 5 r-package 5 bert 5 duplicate-detection 5 awesome 5 string-matching 4 string-similarity 4 sentiment-analysis 4 awesome-list 4 meta-blocking 4 database 4 data-engineering 4 entity-alignment 4 pandas 4 sigmod-programming-contest 4 link-discovery 4 product-matching 4 python-library 4 nlu 4 relation-extraction 4 disambiguation 4 natural-language-processing 4 benchmark 3 pytorch 3 sentence-transformers 3 linked-data 3 ai 3 transformers 3 entity-relationship 3 fairness 3 ner 3 graph-algorithms 3 optimal-transport 3 string-distance 3 data-fusion 3 compound-ai-systems 3 data-management 3 graph 3 sql 3 large-language-models 3 python3 3 matching 3 networkx 3 ml 2 entity 2 text-classification 2 streamlit 2 identity 2 language-detection 2 openrefine 2 resolution 2 slot-filling 2 reconciliation-service 2 natural-language-understanding 2 senzing 2 similarity 2 similarity-measures 2 logistic-regression 2 elasticsearch-plugin 2 dataset 2 textgraphs 2 unstructured-data 2 data 2 text-mining 2 tokenization 2 bayesian-inference 2 mcmc 2 wikipedia 2 active-learning 2 r 2 data-platform 2 named-entities 2 zentity 2 kuzu 2 network-analysis 2 reproducibility 2 embeddings 2