An open API service providing repository metadata for many open source software ecosystems.

GitHub / rounayak / Data-Profiling-Tool

The program compares two files at a time and does the following 1.Gathering metadata on the individual tables(column count,record count,list of columns with datatype etc) 2.Identifying matching columns between tables based on names as well as data. Using machine learning, we are handling syntactic as well as semantic variations of column names for accurate matching. 3. Finding duplicate columns in single table with the option to deduplicate if required 4. Finding columns with missing data/null values.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rounayak%2FData-Profiling-Tool

Stars: 3
Forks: 0
Open issues: 0

License: None
Language: Python
Size: 1.95 KB
Dependencies parsed at: Pending

Created at: over 7 years ago
Updated at: almost 4 years ago
Pushed at: over 7 years ago
Last synced at: 5 months ago

Topics: data-profiling, python

    Loading...