An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-cleansing

oreseckoa/Analysis-of-CRM-system-data

This project is dedicated to analyzing CRM system data for an online programming school.

Size: 3.27 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

trumhacker-cyber/Data-Analytics-Certificate

This repository showcases my journey in the fascinating world of Data Analytics.

Size: 19 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

data-forge/data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Language: TypeScript - Size: 3.68 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 1,359 - Forks: 78

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 1,512 - Forks: 233

Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Language: C++ - Size: 143 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 403 - Forks: 76

probcomp/PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

Language: Julia - Size: 1.36 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 223 - Forks: 33

DataPreprocessing/DataCleaning

Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.

Language: Python - Size: 117 KB - Last synced at: 23 days ago - Pushed at: about 4 years ago - Stars: 8 - Forks: 3

marvrch/Titanic-ExploratoryDataAnalysis

This project focuses on cleaning and analyzing the Titanic dataset using Python. It explores patterns in the data through exploratory data analysis (EDA) and highlights the importance of data cleaning in preparing datasets for further analysis or machine learning.

Language: Jupyter Notebook - Size: 16.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

vaxdata22/Water-Quality-DW-on-Oracle-Database

This is an Oracle DB Data Warehouse and manual ETL demo on a specially formatted Water Quality dataset from DEFRA, UK. It is a personal academic-grade exercise to explore the basic concepts of data warehousing and manual ETL process from an academic perspective.

Language: Jupyter Notebook - Size: 380 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

bluestero/urlgenie

Python package to make URL extraction, generalization, validation, and filtration easy.

Language: Python - Size: 204 KB - Last synced at: 25 days ago - Pushed at: 12 months ago - Stars: 4 - Forks: 1

Rudra-G-23/SQL-Data-Warehouse-Project

This repo provides a step-by-step approach to building a modern data warehouse using PostgreSQL. It covers the ETL (Extract, Transform, Load) process, data modeling, exploratory data analysis (EDA), and advanced data analysis techniques.

Language: PLpgSQL - Size: 9.32 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

saya304/Data-Cleaning-and-Exploratory-Data-Analysis

This project focuses on data cleaning and exploratory data analysis (EDA) in Snowflake, transforming raw data into meaningful insights using SQL

Size: 89.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jasonh119/TransactionTracker

Little Application for Transaction Aggregation, cleaning and Categorisation for Learning DS and LLMs

Language: Python - Size: 297 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

SamHollings/nhs_data_cleansing

A repo of reusable functions for cleansing data

Language: Python - Size: 52.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

data-integrations/wrangler

Wrangler Transform: A DMD system for transforming Big Data

Language: Java - Size: 5.75 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 91 - Forks: 56

TimKong21/PwC-Switzerland-Power-BI-in-Data-Analytics-Virtual-Case-Experience

Comprehensive Power BI dashboards showcasing insights on Call Centre Trends, Customer Retention, and Diversity & Inclusion to drive business impact.

Size: 4.43 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 5

bakdata/dedupe

Java DSL for (online) deduplication

Language: Java - Size: 1.01 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 20 - Forks: 2

HypertextAssassin0273/Excel_Data_Organizer_and_Cleaner-DS_Project

Data Structures project in C++11 language, uses custom Vector & String structures with Move Semantics (Rule of Five)

Language: C++ - Size: 1.39 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 2

Ashbyt/Python

Ashley Bythell - Python

Language: Jupyter Notebook - Size: 5.61 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Skunkworks-Labs/data-management

The website is now described as an educational resource for data management, with the objective of educating, engaging, guiding, and providing resources.

Language: HTML - Size: 199 KB - Last synced at: 9 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

BDFD-LearningGround/Cousera_Google-Data-Analytics-Professional-Certificate

Quizzes & Assignment Solutions for Google Data Analytics Professional Certificate on Coursera. Also included a few resources on side that I found helpful.

Size: 38.2 MB - Last synced at: 10 months ago - Pushed at: about 3 years ago - Stars: 197 - Forks: 55

AhmdLx/PPP_loans_Analysis

Data Cleaning, Exploration, and Insights

Language: TSQL - Size: 534 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

brunocampos01/porto-seguro-safe-driver-prediction

Predict if a driver will file an insurance claim next year. (Kaggle Competition)

Language: Python - Size: 93.8 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 5

vbhvsingh0/CDC_immunization

This project explores the relationships in between different vaccines and the sex, age and other basic features in the data.

Language: Python - Size: 2.72 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

ClimerLab/mrclean

Two Mixed Integer Programs for cleaning a data file.

Language: C++ - Size: 43.9 KB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

decimus01/Spotify_songs_data_analysis

Analysis of songs from the period 18 October 2024 to 1 May 2024 from Spotify data.

Language: Jupyter Notebook - Size: 861 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

KevinVChin/Google-Data-Analytics-Professional-Certificate

Google Data Analytics Professional Certificate program instructs on how to clean and organize data for analysis, and complete analysis and calculations using spreadsheets, SQL, Tableau and R programming.

Language: HTML - Size: 1.97 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

LieseB-1746743/data-cleaning

Data cleaning tool.

Language: JavaScript - Size: 1.81 MB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 9 - Forks: 5

fpjnijweide/autoencoder-pdb-cleaning

This is the source code for the paper "A probabilistic database approach to autoencoder-based data cleaning".

Language: Jupyter Notebook - Size: 242 MB - Last synced at: 17 days ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 0

Rahma-Farag/Rahma-Farag

Main Repository

Language: Jupyter Notebook - Size: 71.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

aminkhod/TA--Course-ofData-mining--Fall-2018 📦

Here is some implementation and using methods in Topics on Data mining course

Language: Python - Size: 32.1 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

Abhigyan76/Pizza-Sales-Insight

Used SQL, Power BI to make insightful dashboard

Size: 2.24 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

JoeRegnier/horkos

Data quality analysis and scoring system.

Language: Python - Size: 3.39 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

tneriaransom/data-analysis-portfolio

This repository houses a curated collection of projects designed to highlight my expertise in data analytics.

Size: 39.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

RizqiSeijuuro/final-project-kelompok-03-aditya-bariq

Weekly Sales Prediction at Walmart Dataset. Buat dikumpulin di Final Projek Studi Independen Batch 3

Language: Jupyter Notebook - Size: 21.9 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

Irrelev4nt13/Customer-Personality-Analysis

📊Customer Personality Analysis, using various Data Mining techniques and Machine Learning algorithms.

Language: Jupyter Notebook - Size: 1.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

fitria-dwi/Business-Decision-Research

Language: Jupyter Notebook - Size: 218 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Somu-cSs/Water-Quality-Analysis-and-Prediction.

Interactive Dashboard Web-app :

Language: Jupyter Notebook - Size: 3.09 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

weichi21/KNN-Model-Car-Price-Prediction

Predictive modeling project by implementing KNN regression model.

Language: Jupyter Notebook - Size: 438 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

kshaikh23/NBA-Playoffs-Project

Statistical analysis comparing team play in the NBA regular season and playoffs. Linear Regression algorithm to predict players playoffs points per game based on their regular season stats. Collaborated with Stephan MacDougall.

Language: Jupyter Notebook - Size: 664 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BDFD-LearningGround/Cousera_Applied-Data-Science-with-Python-Specialization-OP

Quizzes & Assignment Solutions for Applied Data Science with Python Specialization on Coursera. Also included a few resources on side that I found helpful.

Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Yaresh01/Finance-and-Risk-Analytics-project

Language: Jupyter Notebook - Size: 9.42 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rahulodedra30/House-Recommendation-Based-on-Neighbourhood

Recommended house based on neighbourhood using K-Means clustering after scraping data from Wikipedia website

Language: Jupyter Notebook - Size: 572 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ITISALLDATA/DATA-CLEANING-PROJECT-WITH-SQL

In this project, I cleaned up a large FIFA 2021 dataset with 18,000+ player records. The data was messy, with inconsistencies in 77 columns. I focused on making the data consistent and usable for analysis. This repository documents my step-by-step process, demonstrating how I transformed the data into a clean format.

Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dikshantnaik/Data-Cleaning-Assignment-Internship

A Python script to Parse data from Non-Meaningful data to Meaningful and save it to .csv File

Language: Python - Size: 27.3 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

ajaymache/data-analysis-using-python

Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊

Language: Jupyter Notebook - Size: 49.3 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 193 - Forks: 89

extremecode/stress-detection-in-social-networks

stress detection in social networks

Language: R - Size: 5.26 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 2

ojasphansekar/Zillow-Home-Value-Prediction

XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 10 - Forks: 7

julianacastilloaraujo/Google-Data-Analytics

⭐️ Google Data Analytics + Coursera ⭐️ 👩‍💻 Datos, datos, en todas partes(este curso) 🔍 Skills : Spreadsheet, Data Cleansing, Data Analysis, Data Visualization (DataViz), SQL

Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

AVJdataminer/Squeaky

R package for data cleaning and pre-processing for data science

Language: R - Size: 79.1 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

AliiPmD/House_Prices_Advanced_Regression

Advanced Regression for House Prices with data preprocessing steps (like Data exploration, Cleansing, visualization, etc.) and training a model with 0.945 score.

Language: Jupyter Notebook - Size: 661 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

muhammadhamzah8/Ecommerce-Shipping-Classification-Modeling

Exploratory Data Analysis & Modeling to predict whether the shipping deliveries will be received late or on-time by the customers

Language: Jupyter Notebook - Size: 34.4 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 2

edohgoka/Predict_Success_Of_a_Restaurant

Predicting the success or not of a restaurant.

Language: Jupyter Notebook - Size: 33.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

aaron-evans-cruz/Python-Portfolio-Projects

Python Portfolio Projects. Highlighting skills in Python, Pandas, data cleaning, correlation...

Language: Jupyter Notebook - Size: 2.48 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AlisonYao/2020SummerResearch_ChineseResDatabase Fork of xuyou1999/2020SummerResearch_ChineseResDatabase

Language: Jupyter Notebook - Size: 29.7 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

kenlhlui/Brexit_referendum_data_cleaning

R data cleaning project Brexit Referendum voting data.

Language: R - Size: 5.52 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

scrab017/tgdl

Analysis of physicians registered under The Tripura State Medical Council. Data scraped from https://tsmc.tripura.gov.in/doc_list

Language: HTML - Size: 1.25 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

AP-State-Skill-Development-Corporation/Data-Science-Using-Python-Internship-EB1

This repo created for sharing the required/discussed files during Online Internship training program on Data Science Using Python in May-2021

Language: Jupyter Notebook - Size: 21.8 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 13 - Forks: 10

iweld/data_cleaning

An SQL data cleaning project

Size: 586 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 4

doratako/Data-Quality-Assurance

Data validation and data cleansing

Language: Jupyter Notebook - Size: 54.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

pmb-7684/Google-Data-Analytics-Professional-Certificate

Learning materials, assignments, and helpful resources for professional certification. Completed October 2022

Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

pmb-7684/Applied-Data-Science-with-Python-Specialization

Coursera specialization taught by University of Michigan. Expected completion Date July 2023

Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

AlecVail/Preparing_Data_Using_Alteryx

Alteryx Academy Challenge #363

Size: 41 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

siegstedt/predict_credit_card_approval

Commercial banks receive a lot of applications for credit cards. Many of them get rejected for many reasons, like high loan balances, low income levels, or too many inquiries on an individual's credit report, for example. Manually analyzing these applications is mundane, error-prone, and time-consuming. Luckily, this task can be automated with the power of machine learning. Here is an automatic credit card approval predictor using machine learning techniques.

Language: Jupyter Notebook - Size: 32.2 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

DablewCodes/Nashville-Housing-Data-Cleaning

Performed Data Cleaning by using advanced SQL such as CTEs, Joins, Rank Functions, Aggregate Functions etc.

Size: 11.3 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

federicozukierman/data-cleaning-SQL

In this project I clean data from the Nashville (US) housing database

Size: 1000 Bytes - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

raihanjp98/DTS-KOMINFO-Data-Engineer-Career-Track-DQLab

A collection of scripts written to complete DQLab Data Engineer Career Track

Language: Python - Size: 28.9 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

grahman20/kDMI

kDMI employs two levels of horizontal partitioning (based on a decision tree and k-NN algorithm) of a data set, in order to find the records that are very similar to the one with missing value/s. Additionally, it uses a novel approach to automatically find the value of k for each record.

Language: Java - Size: 267 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

grahman20/FIMUS

FIMUS imputes numerical and categorical missing values by using a data set’s existing patterns including co-appearances of attribute values, correlations among the attributes and similarity of values belonging to an attribute.

Language: HTML - Size: 162 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

programindz/heartattackpredictor

A machine learning model using Support Vector Machine classification to predict chances of an individual having a heart attack based on features like age, sex, cholestrol, blood pressure, chest pain, heart beat etc.

Language: Jupyter Notebook - Size: 88.9 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

hazem-alabiad/taxi-tip-estimator

Taxi Tip Estimator (TTS) is a Data Mining project that uses the data collected by taxi drivers to estimate the tips given by customers.

Language: HTML - Size: 53.9 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

abccastro/Canada-PR-Data-Analysis-and-Visualization

Size: 28 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AchmadFachturrohman/FGA2022-Data-Engineer

This is my Data Engineer portfolio

Language: Python - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

S-Vijay-vj/imdb-rating_Data-wrangling-and-exploration-using-SQL

Size: 927 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AlexLamson/DataWrangler

Make quick and dirty data mining made easier in Sublime Text

Language: Python - Size: 353 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 2

PedroChaparro/PI202202-alako-data

This repository contains all the files related to project's data collection, data normalization / cleansing and database management.

Language: Jupyter Notebook - Size: 581 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

agungbudiwirawan/Data_Science_in_Telco-Data_Cleansing

Data cleansing using python: handling missing data values, outliers, and standardized values.

Language: Jupyter Notebook - Size: 259 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

RochaErik/X-MoviesDatasetPt3-Tidy_up_data

Code for cleaning up data. Data from almost 46 thousand movies used.

Language: Jupyter Notebook - Size: 21.8 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

RochaErik/X-MoviesDatasetPt4-Merging_datasets

Code for cleaning and merging datasets.

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

pariosur/food_waste_analysis

Exploratory Data Analysis of Food Waste and Food Loss Database (FAO)

Language: Jupyter Notebook - Size: 628 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

siegstedt/predict_blood_donation

This project works with data collected from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes its blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The dataset, obtained from the UCI Machine Learning Repository, consists of a random sample of 748 donors. The task is to predict if a blood donor will donate within a given time window. The work contains a full model-building process: from inspecting the dataset to using the tpot library to automate your Machine Learning pipeline.

Language: Python - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 3

siegstedt/super_bowl_halftime

Whether or not you like football, the Super Bowl is a spectacle. There's drama in the form of blowouts, comebacks, and controversy in the games themselves. There are the ridiculously expensive ads, some hilarious, others gut-wrenching, thought-provoking, and weird. The half-time shows with the biggest musicians in the world, sometimes riding giant mechanical tigers or leaping from the roof of the stadium. Here, we find out how some of the elements of this show interact with each other.

Language: Jupyter Notebook - Size: 98.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

bastians/address_transformation Fork of thilohuellmann/address_transformation

Transform unstructured, inconsistent or incomplete address data into structured and complete address data with Google Maps Geocoding API.

Language: Python - Size: 12.7 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

shubhankar5/scrub-system-for-de-identification

A scrub system for de-identification and cleaning of data to maintain its privacy from the world.

Language: Python - Size: 1.72 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

thecodemancer/Residential_property_prices_2020

In this code, we're applying data cleansing to this dataset so that we can properly work with it later. The goal is to build a data model with a fact table and dimension tables.

Language: Jupyter Notebook - Size: 2.5 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

AREschweiz/microcensus-geodata-cleaning

R code used to clean the geodata of the Swiss Mobility and Transport Microcensus (MTMC)

Size: 822 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

Dimas263/Preprocessing-Data-into-Train-Test-Val-Data

Python Preprocessing for Sales Project Notebook

Language: Jupyter Notebook - Size: 3.06 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

NataliaAssange/littlejsontools

Some little json tools for my own use and maybe can help you

Language: Python - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

RizqiSeijuuro/walmart-weekly-sales-prediction

Weekly Sales Prediction at Walmart Dataset

Language: Jupyter Notebook - Size: 6.21 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

SabrinaSuraya/Project2-WorkerGarment

determine the worker garment productivity's. regression problem

Language: Python - Size: 12.7 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

data-forge/data-forge-fs

This library contains the file system extensions to Data-Forge that allow it to directly read and write CSV and JSON files in Node.js

Language: TypeScript - Size: 265 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 10 - Forks: 2

mtimjones/dataprocessing

Data cleanse, clustering with Vector Quantization and Adaptive Resonance Theory

Language: C - Size: 37.1 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 9 - Forks: 1

zislam/CAIRAD

Implements the CAIRAD techique for detecting noisy values in a dataset for Weka

Language: Java - Size: 36.1 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

almeidacastrogabriela/Black_Friday_Analysis_DS815

Data manipulation and assessment using Pandas

Language: HTML - Size: 1.44 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

alkashef/cleaning-excel-data

Tidying and cleaning data in Excel sheets

Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

derekngoh/HDB-Resale-Flat-Valuation

HDB flats resale price prediction. Neural network in Python. Machine learning models in R. Data pre-processing, feature engineering and feature selection mainly in R.

Language: Jupyter Notebook - Size: 5.45 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

dikoharyadhanto/Data-Preparation-Documentation

Dokumentasi Pembelajaran Tahap Data Cleansing

Language: HTML - Size: 900 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Data-Wrangling-with-JavaScript/Chapter-6

Code examples for Chapter 6 of Data Wrangling with JavaScript

Language: JavaScript - Size: 154 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

antonindanalet/microcensus-geodata-cleaning

R code used to clean the geodata of the Swiss Mobility and Transport Microcensus (MTMC)

Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Ketan2010/TCS-Talent-Ocean

TCS Talent Ocean Challenge submission. Find suitable candidate for project based on skills.

Language: Jupyter Notebook - Size: 735 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Related Keywords
data-cleansing 112 data-cleaning 34 python 32 data-science 30 machine-learning 27 data-analysis 24 data-visualization 22 data 16 sql 14 pandas 14 data-wrangling 13 exploratory-data-analysis 11 data-analytics 8 data-mining 8 data-preprocessing 8 data-manipulation 8 data-preparation 7 numpy 7 preprocessing 6 matplotlib 5 data-transformation 5 feature-engineering 5 classification 5 jupyter-notebook 5 r 5 excel 5 feature-selection 4 deep-learning 4 etl 4 data-quality 4 python3 4 data-exploration 4 decision-making 4 javascript 4 neural-networks 3 missing-value-imputation 3 missing-values 3 data-processing 3 machine-learning-algorithms 3 dataset 3 data-munging 3 regression-models 3 data-engineering 3 regression 3 logistic-regression 3 missing-data 3 random-forest 3 json 3 data-profiling 3 nodejs 3 scikit-learn 3 csv 3 visualization 3 kaggle-competition 2 xgboost 2 pandas-python 2 data-modeling 2 statistics 2 data-aggregation 2 data-visualisation 2 dataviz 2 data-privacy 2 sklearn 2 microsoft-power-bi 2 powerbi 2 preparation 2 parsing 2 big-data 2 analysis 2 weka 2 data-analysis-python 2 python-programming 2 web-scraping 2 seaborn-plots 2 linear-regression 2 java 2 predictive-modeling 2 knn-regression 2 plot 2 switzerland 2 automation 2 missing-data-imputation 2 clustering 2 missing-value-handling 2 mobility-data 2 cleaning-data 2 tableau-software 2 mining 2 r-programming 2 r-markdown 2 data-integrity 2 data-ethics 2 data-collection 2 geodata 2 data-calculations 2 professional-certificates 2 tabular-data 2 spreadsheets 2 imputation 2 data-warehouse 2