An open API service providing repository metadata for many open source software ecosystems.

Topic: "entity-resolution"

dedupeio/dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Language: Python - Size: 5.98 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 4,263 - Forks: 559

moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Language: Python - Size: 98.3 MB - Last synced at: 9 days ago - Pushed at: 13 days ago - Stars: 1,547 - Forks: 171

zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Language: Java - Size: 679 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1,015 - Forks: 125

J535D165/recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

Language: Python - Size: 70 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 997 - Forks: 156

JohnSnowLabs/nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

Language: Python - Size: 474 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 909 - Forks: 138

heathersherry/Knowledge-Graph-Tutorials-and-Papers

Insightful Tutorials and Papers about Knowledge Graphs

Size: 4.2 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 855 - Forks: 121

Picovoice/rhino

On-device Speech-to-Intent engine powered by deep learning

Language: Python - Size: 272 MB - Last synced at: 42 minutes ago - Pushed at: about 2 hours ago - Stars: 657 - Forks: 91

dedupeio/csvdedupe

:id: Command line tool for deduplicating CSV files

Language: Python - Size: 1.12 MB - Last synced at: 1 day ago - Pushed at: about 5 years ago - Stars: 420 - Forks: 83

dedupeio/dedupe-examples

:id: Examples for using the dedupe library

Language: Python - Size: 5.12 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 410 - Forks: 215

J535D165/data-matching-software

A list of free data matching and record linkage software.

Size: 93.8 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 378 - Forks: 42

izuna385/Entity-Linking-Recent-Trends

Recent trends of Entity Linking, Disambiguation, and Representation.

Size: 787 KB - Last synced at: 24 days ago - Pushed at: almost 4 years ago - Stars: 345 - Forks: 18

microsoft/vert-papers

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).

Language: Python - Size: 22 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 277 - Forks: 94

scify/JedAIToolkit

An open source, high scalability toolkit in Java for Entity Resolution.

Language: Java - Size: 278 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 218 - Forks: 47

amazon-science/ReFinED

ReFinED is an efficient and accurate entity linking (EL) system.

Language: Python - Size: 433 KB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 212 - Forks: 45

maxharlow/csvmatch

🔎 Finds fuzzy matches between CSV files

Language: Python - Size: 158 KB - Last synced at: 14 days ago - Pushed at: 26 days ago - Stars: 189 - Forks: 22

zentity-io/zentity

Entity resolution for Elasticsearch.

Language: Java - Size: 634 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 158 - Forks: 29

vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 151 - Forks: 16

dell-research-harvard/linktransformer

A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

Language: Python - Size: 1.81 MB - Last synced at: 9 days ago - Pushed at: 17 days ago - Stars: 118 - Forks: 10

codeforkjeff/conciliator

OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.

Language: Java - Size: 924 KB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 117 - Forks: 23

usc-isi-i2/rltk

Record Linkage ToolKit (Find and link entities)

Language: Python - Size: 9.59 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 110 - Forks: 23

ropeladder/record-linkage-resources

Resources for tackling record linkage / deduplication / data matching problems

Size: 26.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 103 - Forks: 16

Wikidata/soweego

Link Wikidata items to large catalogs

Language: Python - Size: 7.87 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 96 - Forks: 9

fritshermans/deduplipy

Python package for deduplication/entity resolution using active learning

Language: Python - Size: 521 KB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 78 - Forks: 9

AI-team-UoA/pyJedAI

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

Language: Python - Size: 139 MB - Last synced at: 9 days ago - Pushed at: 20 days ago - Stars: 76 - Forks: 11

DerwenAI/strwythura

How to construct knowledge graphs from unstructured data sources

Language: Jupyter Notebook - Size: 1.22 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 67 - Forks: 6

data61/anonlink

Python implementation of anonymous linkage using cryptographic linkage keys

Language: Python - Size: 3.19 MB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 65 - Forks: 8

Gaglia88/sparker

SparkER: an Entity Resolution framework for Apache Spark

Language: Scala - Size: 48.4 MB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 64 - Forks: 19

OlivierBinette/Awesome-Entity-Resolution

List of entity resolution software and resources.

Size: 28.3 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 63 - Forks: 8

Senzing/awesome

Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.

Language: Python - Size: 244 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 57 - Forks: 2

cleanzr/dblink

Distributed Bayesian Entity Resolution in Apache Spark

Language: Scala - Size: 455 KB - Last synced at: 8 months ago - Pushed at: almost 4 years ago - Stars: 57 - Forks: 9

wcmc-its/ReCiter

ReCiter: an enterprise open source author disambiguation system for academic institutions

Language: Java - Size: 234 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 47 - Forks: 24

databricks-industry-solutions/auto-data-linkage

Low effort linking and easy de-duplication. Databricks ARC provides a simple, automated, lakehouse integrated entity resolution solution for intra and inter data linking.

Language: Python - Size: 6.12 MB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 47 - Forks: 22

J535D165/recordlinkage-annotator

A browser user interface for manual labeling of record pairs.

Language: JavaScript - Size: 3.49 MB - Last synced at: 14 days ago - Pushed at: almost 2 years ago - Stars: 46 - Forks: 8

wbsg-uni-mannheim/MatchGPT

This repository contains code and extensive prompt examples to reproduce and extend the experiments in our papers "Using ChatGPT for Entity Matching" and "Entity Matching using Large Language Models".

Language: Jupyter Notebook - Size: 185 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 42 - Forks: 8

HPI-Information-Systems/snowman

Welcome to Snowman App – a Data Matching Benchmark Platform.

Language: TypeScript - Size: 85.8 MB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 38 - Forks: 2

iesl/learned-string-alignments

Learning String Alignments for Entity Aliases

Language: Python - Size: 37.1 KB - Last synced at: 9 days ago - Pushed at: about 6 years ago - Stars: 37 - Forks: 6

vaneseltine/nominally

A maximum-strength name parser for record linkage.

Language: Python - Size: 1.09 MB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 36 - Forks: 1

entrepreneur-interet-general/Merge-Machine

Merge Dirty Data with Clean Reference Tables

Language: Python - Size: 1.27 MB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 35 - Forks: 3

iesl/stance

Learned string similarity for entity names using optimal transport.

Language: Python - Size: 71.3 KB - Last synced at: 9 days ago - Pushed at: over 4 years ago - Stars: 35 - Forks: 3

ing-bank/spark-matcher

Record matching and entity resolution at scale in Spark

Language: Python - Size: 579 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 34 - Forks: 8

Graphlet-AI/graphlet

PyPi module for Graphlet AI Knowledge Graph Factory

Language: Python - Size: 20.4 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 29 - Forks: 1

wbsg-uni-mannheim/productbert-intermediate

This repository contains code and data download scripts for the paper "Intermediate Training of BERT for Product Matching" by Ralph Peeters, Christian Bizer and Goran Glavaš.

Language: Python - Size: 104 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 7

OlivierBinette/er-evaluation

An End-to-End Evaluation Framework for Entity Resolution Systems

Language: Python - Size: 62.4 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 9

dobraczka/kiez

🏘️ Hubness reduced nearest neighbor search for entity alignment with knowledge graph embeddings

Language: Python - Size: 903 KB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 25 - Forks: 3

J535D165/FEBRL-fork-v0.4.2

Fork of the Freely Extensible Biomedical Record Linkage program

Language: Python - Size: 6.36 MB - Last synced at: 14 days ago - Pushed at: over 8 years ago - Stars: 24 - Forks: 21

neo4j-graph-examples/entity-resolution

Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real entity across different digital entities present on same or different data sets. Record linking is necessary when joining different entities which are similar and may or may not share some common identifiers. Neo4j offers various advantages to perform entity resolution / record linking. This repository covers such a use case of linking similar user accounts for analytics and providing better recommendations.

Language: Go - Size: 1.17 MB - Last synced at: 15 days ago - Pushed at: 7 months ago - Stars: 23 - Forks: 4

abcsys/libem

Compound AI toolchain for fast and accurate entity matching, powered by LLMs.

Language: Python - Size: 3.54 MB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 22 - Forks: 4

AI-team-UoA/JedAI-WebApp

JedAI-WebApp is a GUI that facilitates the execution of JedAI. JedAI is an open source, high scalability toolkit that offers out-of-the-box solutions for any data integration task. This web-app is developed using spring-boot and ReactJS.

Language: JavaScript - Size: 82.2 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 21 - Forks: 6

molybdenum-99/whatis

WhatIs.this: simple entity resolution through Wikipedia

Language: Ruby - Size: 3.45 MB - Last synced at: 17 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 2

ngmarchant/comparator

Similarity and distance measures for clustering and record linkage applications in R

Language: R - Size: 275 KB - Last synced at: 10 days ago - Pushed at: about 3 years ago - Stars: 18 - Forks: 0

tilotech/langchain-tilores 📦

This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.

Language: Python - Size: 29.3 KB - Last synced at: 22 days ago - Pushed at: 5 months ago - Stars: 17 - Forks: 1

snipsco/snips-nlu-parsers

Rust crate for entity parsing

Language: Rust - Size: 128 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 35

scify/jedai-ui

UI for JedAI Toolkit

Language: Java - Size: 1.09 MB - Last synced at: 10 days ago - Pushed at: almost 3 years ago - Stars: 17 - Forks: 5

vefthym/MinoanER

Minoan ER is an Entity Resolution (ER) framework, built by researchers in Crete (the land of the ancient Minoan civilization). Entity resolution aims to identify descriptions that refer to the same entity within or across knowledge bases.

Language: Java - Size: 221 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 17 - Forks: 6

databricks-industry-solutions/customer-er

Translating text attributes (like name, address, phone number) into quantifiable numerical representations Training ML models to determine if these numerical labels form a match Scoring the confidence of each match

Language: Python - Size: 137 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 6

ngmarchant/oasis

A Python package for efficient evaluation based on OASIS (Optimal Asymptotic Sequential Importance Sampling).

Language: Python - Size: 16.3 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 15 - Forks: 3

NickCrews/mismo

The SQL/Ibis powered sklearn of record linkage

Language: Python - Size: 9.15 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 3

pmart123/cymbology

Identifies and validates financial security ids such as Sedol, Cusip, Isin numbers.

Language: Python - Size: 58.6 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

wbsg-uni-mannheim/wdc-lspc-v2

This repository contains code and data download scripts for the paper "Using schema.org annotations for training and maintaining product matchers" by Ralph Peeters, Anna Primpeli, Benedikt Wichtlhuber and Christian Bizer.

Language: Jupyter Notebook - Size: 37.1 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 14 - Forks: 2

cleanzr/fasthash

Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).

Language: Python - Size: 1.6 MB - Last synced at: 8 months ago - Pushed at: about 6 years ago - Stars: 14 - Forks: 3

PatentsView/PatentsView-Evaluation 📦

Evaluation and benchmarking of PatentsView disambiguation algorithms

Language: Python - Size: 156 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 8

ArjitJ/DIAL

Implementation of the paper "Deep Indexed Active Learning for Matching Heterogeneous Entity Representations"

Language: Python - Size: 4.88 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 2

maxharlow/textmatch

🔎 Finds fuzzy matches between datasets

Language: Python - Size: 120 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 12 - Forks: 0

cleanzr/clevr

Clustering and Link Prediction Evaluation in R

Language: R - Size: 114 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 3

ufbmi/onefl-deduper

Tools for EHR patient de-duplication (aka entity resolution)

Language: Python - Size: 19.4 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 4

wbsg-uni-mannheim/jointbert

This repository contains the code and data download links to reproduce the experiments of the PVLDB paper "Dual-Objective Fine-Tuning of BERT for Entity Matching" by Ralph Peeters and Christian Bizer.

Language: Python - Size: 125 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 12 - Forks: 5

matchID-project/backend

Backend (Docker & API) for matchID project

Language: Python - Size: 9.95 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 11 - Forks: 14

tshu-w/ComEM

Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)

Language: Python - Size: 158 KB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 2

ncn-foreigners/blocking

An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.

Language: R - Size: 5.63 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 9 - Forks: 0

ScaDS/MovieGraphBenchmark

📽 Benchmark datasets for Entity Resolution on Knowledge Graphs

Language: Python - Size: 7.94 MB - Last synced at: about 18 hours ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

aws-solutions-library-samples/guidance-for-patient-entity-resolution-with-aws-healthlake

AWS HealthLake patient matching with AWS Entity Resolution

Language: Python - Size: 543 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 8 - Forks: 1

vefthym/fairER

FairER: Entity Resolution with Fairness Constraints

Language: Python - Size: 113 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 5

Evnsn/awsome-entity-resolution

A collection of awesome resources regarding Record Linkage.

Size: 13.7 KB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 7 - Forks: 0

dobraczka/sylloge

🗃️ Small library to simplify collecting and loading of entity alignment benchmark datasets

Language: Python - Size: 281 KB - Last synced at: 2 days ago - Pushed at: 10 months ago - Stars: 7 - Forks: 1

KirovVerst/qlink

Entity Resolution and Record Linkage library

Language: Python - Size: 4.84 MB - Last synced at: 1 day ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 0

OlivierBinette/groupbyrule

Deduplicate data using fuzzy and deterministic matching rules.

Language: Python - Size: 11.9 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 0

francetem/deduper

A general purpose deduplication framework

Language: Java - Size: 377 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 7 - Forks: 0

tshu-w/Uniblocker

Code and data for the paper: Towards Universal Dense Blocking for Entity Resolution

Language: Python - Size: 17.6 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 6 - Forks: 0

dobraczka/klinker

🧱 blocking methods for entity resolution

Language: Python - Size: 1.19 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 6 - Forks: 0

dobraczka/forayer

forayer is a library of first aid utilities for knowledge graph exploration with an entity centric approach.

Language: Jupyter Notebook - Size: 1.39 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

Gaglia88/ruler

Scalable record-level matching rules

Language: Scala - Size: 2.44 MB - Last synced at: 14 days ago - Pushed at: about 5 years ago - Stars: 6 - Forks: 0

rs9000/DeepEntityMatching

Entity matching in PyTorch

Language: Python - Size: 321 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 6 - Forks: 2

tshu-w/EMBer

Code and data for the paper "Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction" (IJCAI 2022)

Language: Python - Size: 29.8 MB - Last synced at: 2 days ago - Pushed at: 9 days ago - Stars: 5 - Forks: 2

DerwenAI/kleptosyn

Synthetic data generation for investigative graphs based on patterns of bad-actor tradecraft.

Language: Jupyter Notebook - Size: 1.88 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 0

ADBond/splinkclickhouse

Allows Clickhouse to be used as the execution engine for Splink

Language: Python - Size: 959 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

tteofili/certa

CERTA - Computing Entity Resolution explanations with TriAngles

Language: Python - Size: 26.8 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 5 - Forks: 3

tilotech/tilores-langchain

This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.

Language: Python - Size: 35.2 KB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

kuzudb/nobel-network

Data and code for Nobel Laureate academic genealogy network analysis and entity resolution

Language: Jupyter Notebook - Size: 385 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 5 - Forks: 0

CangyuanLi/floof

Fuzzymatching made easy

Language: Rust - Size: 359 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

cleanzr/exchanger

Bayesian Entity Resolution with Exchangeable Random Partition Priors

Language: C++ - Size: 437 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

Nikoletos-K/Entity-resolution-SIGMOD-2020

📷🎥 Entity resolution system for SIGMOD 2020 programming contest

Language: C - Size: 19.2 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

cleanzr/dblinkR

An R interface for the dblink Spark application

Language: R - Size: 19 MB - Last synced at: 8 months ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

GemsLab/node2bits

Compact time- and attribute-aware node representations

Language: Python - Size: 36.1 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 6

remerjohnson/conda-reconcile

An Anaconda3 environment with relevant python libraries to support various linked data OpenRefine reconciliation scripts

Size: 13.7 KB - Last synced at: 6 months ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 1

fgregg/smered

Mirror of https://bitbucket.org/resteorts/smered

Language: Java - Size: 4.48 MB - Last synced at: 7 days ago - Pushed at: about 8 years ago - Stars: 5 - Forks: 0

harpin-ai/toolkit-examples

Examples for trying out the harpin AI identity resolution and data quality toolkit

Language: Jupyter Notebook - Size: 599 KB - Last synced at: 23 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

teomores/Oracle_HPC_contest

Entity resolution on bank account data.

Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 1

RecordLinkageIG/RecordLinkageIG.github.io

Blog of the American Statistical Association's Record Linkage Interest Group.

Language: HTML - Size: 6.24 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

wbsg-uni-mannheim/winter Fork of olehmberg/winter

WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.

Language: Java - Size: 18.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

tilotech/python-tilores-sdk

The tilores-sdk Python package is a small SDK to develop with the Tilores entity resolution system.

Language: Python - Size: 40 KB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

Related Topics
record-linkage 58 deduplication 37 python 23 entity-matching 20 data-matching 19 machine-learning 17 fuzzy-matching 14 data-integration 12 data-science 12 entity-linking 10 knowledge-graph 10 blocking 10 nlp 9 dedupe 8 named-entity-recognition 7 spark 7 entity-extraction 7 llm 6 identity-resolution 6 java 6 clustering 6 r-package 5 duplicate-detection 5 bert 5 elasticsearch 5 pandas 4 relation-extraction 4 natural-language-processing 4 python-library 4 link-discovery 4 sigmod-programming-contest 4 database 4 entity-alignment 4 product-matching 4 disambiguation 4 awesome 4 meta-blocking 4 string-matching 4 nlu 4 sentiment-analysis 4 deep-learning 4 string-similarity 4 data-fusion 3 ai 3 string-distance 3 entity-relationship 3 awesome-list 3 python3 3 linked-data 3 data-management 3 networkx 3 compound-ai-systems 3 ner 3 pytorch 3 sql 3 large-language-models 3 graph-algorithms 3 matching 3 identity 3 data-engineering 3 benchmark 3 approximate-nearest-neighbor-search 2 openrefine 2 representation-learning 2 similarity-measures 2 entity-disambiguation 2 rust 2 r 2 streamlit 2 igraph 2 text-classification 2 network-analysis 2 transformers 2 natural-language-understanding 2 sentence-transformers 2 embeddings 2 ml 2 graphs 2 gnns 2 splink 2 langchain 2 agentic-rag 2 duckdb 2 resolution 2 fuzzymatch 2 entity 2 de-duplicating 2 unstructured-data 2 textgraphs 2 aws 2 dataset 2 neo4j 2 tokenization 2 text-mining 2 language-detection 2 jedai 2 similarity 2 data 2 reconciliation-service 2 zentity 2