GitHub topics: cleaning-dataset

Repositories

Geoffrey3wu/sales-data-sas-project

SAS-based data cleaning and sales reporting project

Language: SAS - Size: 0 Bytes - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

fgiorgia/data-cleaning-for-housing-data

Cleaning the Nashville Housing Data dataset.

Language: PLpgSQL - Size: 3.29 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

Shivmalge/SQL-Data-Analysis-Healthcare-Project

SQL - Healthcare Dataset Analysis

Size: 561 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Alwin2397/MySQL_World_Corporate_Layoffs_Data_Analysis

MySQL project on world corporate layoffs: cleaning and analyzing layoffs dataset.

Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

DouweHorsthuis/EEG_to_ERP_pipeline_stats_R

General pipeline used for analyzing EEG data where Raw EEG data gets transformed into ERPS and Stats are done in R (Mixed effects models)

Language: MATLAB - Size: 10.5 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 4

josedanielchg/Efficient-Data-Storage-for-Predictive-Modeling

DataCamp project from the Associate Data Scientist track, focusing on optimizing dataset storage by transforming data types and filtering. Prepares data for efficient machine learning workflows

Language: Jupyter Notebook - Size: 2.23 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

DipunMohapatra/Nashville-Housing-Dataset-Cleaning-Using-SQL

A data cleaning project for the Nashville Housing dataset, focused on handling missing values, removing duplicates, and standardising fields to improve data quality and reliability for real estate analysis.

Size: 5.14 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

abeylicious/SQL-Projects

Data cleaning, transformation, standardization and exploration of data in MySQL server

Language: TSQL - Size: 4.88 KB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

BaNamTheAnalyst/U.S-Housing-Market-Factors-Project

The impact of macroeconomic indicators on the housing price index in the United States during the period from 19xx to 2012.

Language: Jupyter Notebook - Size: 663 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

MoonmoonSamal/Data-Driven-Google-Ads-for-Listing-Sites-Analysis

Analyzed Google Ads performance to identify top channels, keywords, and geographical impact

Language: Jupyter Notebook - Size: 321 KB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MoonmoonSamal/Meesho_Order_Financial_Analysis

Generating insights from Meesho sales data (Oct-Nov)

Language: Jupyter Notebook - Size: 200 KB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vggm/neo4j_ML

Simple project that extract, clean and process a dataset and import the data to a nosql database. Implementation of a simple app to work with.

Language: Jupyter Notebook - Size: 52.3 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Manar20575/Data-Science-Project

build a models that predicts whether an individual makes over $50,000 per year.

Language: Jupyter Notebook - Size: 5.01 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Luc1eSky/finsim_data_exploration

This repo is a initial data exploration of the FinSim Game

Language: Rich Text Format - Size: 1.58 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

RAQUELFONT/Master-s-Projects

A compilation of impactful projects undertaken during my master's degree studies. 🎓

Language: Jupyter Notebook - Size: 4.06 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Chisom-chukwumerije/Netflix

Netflix is a streaming service that offers a wide variety of award winning TV Shows, Movies, Anime, Documentaries, and more. The service primarily distributes original and acquired films and television shows from various genres, and its availability in multiple languages.

Size: 1.57 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Kaybhee/SQL_DA_CLEANING

Size: 5.63 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Photoroom/fast-dataset-cleaner

A simple tool for cleaning image datasets at a glance.

Language: TypeScript - Size: 3.55 MB - Last synced at: about 10 hours ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 3

gambros/PortfolioProject-NashvilleHousingData

In this project I perform data cleaning using T-SQL, to improve the quality of a dataset containing information about houses in Nashville, Tennessee..

Language: TSQL - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

YaroslavaVob/DataCleaning_Project

Project of cleaning of data 'Flats in Moscow and Moscow region'

Language: Jupyter Notebook - Size: 6.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Kishawn-Dorman/Airbnb-Edinburgh-Housing-Dilemma-Analysis

host & listing characteristics to detect illegitimate listing rental

Size: 14.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ITISALLDATA/DATA-CLEANING-PROJECT-WITH-SQL

In this project, I cleaned up a large FIFA 2021 dataset with 18,000+ player records. The data was messy, with inconsistencies in 77 columns. I focused on making the data consistent and usable for analysis. This repository documents my step-by-step process, demonstrating how I transformed the data into a clean format.

Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

divyansh1195/Halliburton-Landmark-Learning-ML-with-Python

Machine Learning with Python: Halliburton Landmark Learning

Language: Jupyter Notebook - Size: 12.5 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Korede34/DATA-CLEANING-PROJECT-WITH-SQL

Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Arkantos-13/Clean_Airbnb_Dataset

Just cleaning an Airbnb dataset with no more digging

Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

zaha2020/Machine_Learning

Machine Learning projects

Language: Jupyter Notebook - Size: 167 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

zaha2020/Data_Analytics

Language: Jupyter Notebook - Size: 5.32 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

shrav-6/datapreparation-entip

This project involves cleaning and preparing data for entip project

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

DoriDoro/algoInvest_trade

Project 7 OpenClassrooms Path - AlgoInvest&Trade -- develop an algorithm to solve a problem

Language: Python - Size: 338 KB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sudhansuku/IMDB-Movie-Analysis

This project aims to carry out the in-depth analysis of IMDB movie dataset. Excel is used to draw insights.

Size: 11.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

praoiticica/Titanic-traditional-ML

Data classification on Titanic dataset using traditional ML methods.

Language: Jupyter Notebook - Size: 14.2 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

afsanamimii/Movie-review-analysis

Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

Sriharish19/EDA-Hotel-Booking-Capstone-Project-1

After cleaning the data, EDA was performed using python libraries like matplotlib and seaborn to display the data and generate business insights that aid hotels in managing their inventories much more effectively.

Language: Jupyter Notebook - Size: 16.2 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ConX/drpt

Tool for preparing a dataset for publishing by dropping, renaming, scaling, and obfuscating columns defined in a recipe.

Language: Python - Size: 68.4 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

JuanDanielMarin/Global-Superstore-Project

Performed the data exploration and cleaning using SQL for a dataset about an e-commerce store to provide answers for smart business questions.

Size: 7.51 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

brianmaleek/project_workspace_2_tweepy

Wrangling and analyzing we rate dogs twitter account which rates people's dogs with a humorous comment about the dog.

Language: Jupyter Notebook - Size: 2.57 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

santurini/PCA-Kmeans-From-Scratch

Application of K-means algorithm on a music dataset after a dimensionality reduction with PCA.

Language: Jupyter Notebook - Size: 18 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

AbiolaBajo10/WeRateDogs-Twitter-Dataset

The dataset for this project is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. This dataset was carefully analysed to find meaninful insights.

Language: Jupyter Notebook - Size: 917 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

darshitparmar/Bike-Sharing-Data-Cleaning-and-Prep

Using the bike sharing data, I demonstrate skills in Data Cleaning and Preparation along with testing the data for normality and transforming it.

Language: HTML - Size: 2.49 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

mdbinger/School_District_Analysis

Determined whether student test scores are impacted by factors such as school size, school budget, student grade, etc. for a city school district using a python script in jupyter notebook with the Pandas dependency. Cleaned city school district data to eliminate problematic data that was impacting our analysis of student success on standardized tests.

Language: Jupyter Notebook - Size: 572 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Neel14-stack/ML-Tasks

Machine Learning Internship Assignment

Language: Jupyter Notebook - Size: 626 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Ron0p/Dashboard

Dataset of 59 ipl match from kaggle named as IPL_Matches_2022.csv,Data analysis is on IPL_2022.py file ,Dash.py is main application file in which ,gui is created using streamlit.

Language: Python - Size: 1.62 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

ZeroDarkHardy/School_District_Analysis

Analysis of District-wide school and student data, refactored to omit data sample with potential academic dishonesty

Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

junanda/preprocessing

source code train models Machine Learning and preprocessing text using python

Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

jacquie0583/Cryptocurrencies

Unsupervised Machine Learning- CyrptoCurrency Analysis, using several models on a cryptocurrency data in order to discover patterns and groups in data. Analysis done to create a report that includes what cryptocurrencies are on the trading market and how they could be grouped in order to create a classification system for potential new investments into the cryptocurrency market.

Language: Jupyter Notebook - Size: 9.81 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

c-morey/challenge-data-analysis

This repository provides a Jupyter notebook on basic data cleaning and exploratory data analysis process with a CSV file that was scrapped from a real estate website in Belgium.

Language: Jupyter Notebook - Size: 84 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

BahramJannesar/ChocolateReveiwDataAnalysis

Data Analysis

Language: Jupyter Notebook - Size: 643 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

Related Keywords

cleaning-dataset 47 data-analysis 9 python 8 cleaning-data 8 data 8 data-visualization 7 visualization 6 sql 6 data-science 5 exploratory-data-analysis 5 seaborn 4 matplotlib 4 analysis 4 jupyter-notebook 4 data-cleaning 4 eda 3 numpy 3 pandas 3 machine-learning 3 python3 3 preprocessing-data 2 pca 2 database 2 sas 2 r 2 pca-analysis 2 airbnb 2 decision-tree-classifier 2 sql-server 2 binary-classification 2 wrangling-data 2 bagofwords 1 nlp-machine-learning 1 colab 1 assessing-data 1 random-forest 1 pthon 1 deep-neural-networks 1 dat-wrangling 1 data-analysis-sql 1 classification-algorithm 1 machine-learning-algorithms 1 machinelearning-python 1 ml 1 oversampling-technique 1 svm-model 1 scraping-api 1 algorithm 1 branch-bound-algorithm 1 finance-analysis-data 1 financial-data 1 greedy-algorithm 1 itertools 1 pandas-library 1 excel 1 imdb-dataset 1 insights 1 dashboard-application 1 scraping 1 streamlit-webapp 1 web 1 preprocessing 1 rnn-gru 1 rnn-tensorflow 1 svm-classifier 1 tensorflow2 1 word2vec-algorithm 1 unsupervised-machine-learning 1 chocolate 1 cocoa-percent 1 dataset 1 ingredients 1 rating 1 taste 1 gathering-data 1 tweepy-api 1 twitter 1 twitter-api 1 weratedogs 1 wrangling-cleaning 1 clustering-algorithm 1 kmeans-algorithm 1 kmeans-plus-plus 1 merging-data 1 pivot-tables 1 wrangling 1 school-district-analysis 1 deeplearning 1 outliers 1 yolov3 1 dash 1 k-means-clustering 1 feature-engineering 1 nosql 1 neo4j 1 dataset-management 1 modelling 1 manipulation 1 statistics 1 pipeline 1