Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: datapreparation

sfu-db/dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.

Language: Python - Size: 214 MB - Last synced: 3 days ago - Pushed: about 2 months ago - Stars: 1,941 - Forks: 200

AnjaliKumari021/Retail_Customer_Behavior_Analysis_using_SQL

Analysed Retail data to understand customer behavior, transaction pattern using SQL

Size: 717 KB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 0

visokio/omniscope-custom-blocks

Public repository for custom blocks for Omniscope

Language: Python - Size: 6.05 MB - Last synced: 5 days ago - Pushed: about 1 month ago - Stars: 5 - Forks: 3

huseyincenik/data_science

Data Science materials

Language: Jupyter Notebook - Size: 51.1 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 3 - Forks: 1

CoDS-GCS/KGFarm

A Holistic Platform for Automating Data Preparation

Language: Python - Size: 290 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6 - Forks: 2

elalfredoignacio/Customer-segmentation

Proyecto de segmentación de clientes, mediante clusterización.

Language: HTML - Size: 2.42 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

mehadihn/Data-Preparation-Techniques-Project

This project was completed for the data preparation techniques course.

Language: Jupyter Notebook - Size: 1.24 MB - Last synced: about 2 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

MadhuBala11/DiabetesPrediction

In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building

Language: Jupyter Notebook - Size: 3.2 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

wsperger/dataprepping_generative_ai

A one stop shop for all tools to prepare datasets for generative ai

Language: Python - Size: 127 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

mahmudie/1_GDP_Analysis

India GDP Analysis using Python

Language: Jupyter Notebook - Size: 871 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 1 - Forks: 0

RafeyIqbalRahman/Data-Imputation-Techniques

This repository demonstrates data imputation using Scikit-Learn's SimpleImputer, KNNImputer, and IterativeImputer.

Language: Python - Size: 8.79 KB - Last synced: 8 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0

sfansaria/Data-Preparation-of-a-housing-dataset

Data Preparation and Data Visualization

Language: Jupyter Notebook - Size: 1.19 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

sunilbabu1981/Learning_Path_for_NLP

A comprehensive path for NLP

Language: Jupyter Notebook - Size: 255 KB - Last synced: 10 months ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

imuhammadaasim/Bikes_Sales_Data_Analysis

The Bikes Sales Analysis Excel Project is a practical exploration of sales data analysis using Microsoft Excel. This project showcases how Excel can be a powerful tool for data cleaning, preprocessing, visualization, and dashboard creation, all within a familiar spreadsheet environment.

Size: 224 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

venkatesh2022/logisticregression-telecom-churn

Language: Jupyter Notebook - Size: 1 MB - Last synced: 10 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

rainaa0277/House-Price-Prediction-using-Linear-Regression

For a real estate firm, building a house price prediction model based upon various factors. Problem - Regression | Algorithm used -Linear Regression using OLS

Language: Jupyter Notebook - Size: 4.03 MB - Last synced: 11 months ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

prathmesh444/Sentiment-Analysis-of-BlackCoffer-Blogs

This project estimates Sentiment Analysis by calculating text Metrices to drive sentimental opinion, sentiment scores, readability, passive words, personal pronouns, etc, etc.

Language: Jupyter Notebook - Size: 57.1 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

NAVEENDATAANALYST/HOTEL-RESERVATIONS-PREDICTION-IN-R

CAN U PREDICT CORRECTLY WHETHER A CUSTOMER WILL CANCEL THE RESERVATION?? You can find the dataset from this kaggle website: https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset

Size: 453 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

JarredP/RStudio-projects-from-Data-Mining-Course

This Repository contains several RMarkdown files that follow the tutorials from 'Introduction to data mining R examples' authored by M.Hahsler. These RMarkdown Tutorials were completed during a Data Mining course completed as part of an MS in Applied Data Analytics

Size: 51.8 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

ydataai/ydata-talkdatatome

Make your dataset talk to you. The AI assistant for data preparation.

Language: Python - Size: 9.77 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 3 - Forks: 0

NAVEENDATAANALYST/SPACESHIP-TITANIC-PASSENGER-TRANSPORT-PREDICTION

The data is available in kaggle competitions. https://www.kaggle.com/competitions/spaceship-titanic I have participated and completed the competition on my own.

Size: 284 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

NAVEENDATAANALYST/CUSTOMER-ANALYTICS-ON-USA-BASED-COMPANY-DATA

This is my 6th semester Essentials of Data Analytics project.

Size: 157 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

deepu9962/Exploratory-Analysis-of-Geolocational-Data

This project involves the use of K-Means Clustering to find the best accommodation for students in Bangalore (or any other city of your choice) by classifying accommodation for incoming students on the basis of their preferences on amenities, budget and proximity to the location.

Language: Jupyter Notebook - Size: 4.17 MB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

fazildgr8/wrist_control_CNN_AWEAR

This is the cumulative repository for the research project Deep Learning Approach to Robotic Prosthetic Wrist Control using EMG Signals done in the AWEAR lab. This repository would consist of all the Data processing pipelines codes, custom data preprocessing library built for this project, and all the time series CNN training Jupyter notebooks using the Data collected within the AWEAR Lab, University at Buffalo.

Language: Jupyter Notebook - Size: 541 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

muharienal/nordstrom-products-prep

Nordstrom Products dataset preparation includes collection, discovery, cleaning, normalization, enrichment, and validation using SQL

Size: 565 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

zcebeci/odetector

Outlier Detection Using Cluster Analysis

Language: R - Size: 558 KB - Last synced: 12 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

victorcouste/trifacta-flows-examples

Trifacta Flows Examples and Templates. Flows zip files, recipes and datasets.

Size: 2.65 MB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 5 - Forks: 2

Ashleshk/Tableau-10-A-Z-Hands-on-Tableau-Training-for-Data-Science-Udemy

Learn data visualization through Tableau 2020 and create opportunities for you or key decision-makers to discover data patterns such as customer purchase behavior, sales trends, or production bottlenecks. This Course on Udemy

Size: 4.7 MB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 2 - Forks: 1

ms8909/dptron

mltrons dptron: Dirty Data in, Clean Data Out!

Language: Python - Size: 75.5 MB - Last synced: 1 day ago - Pushed: over 1 year ago - Stars: 4 - Forks: 2

DaveChui/Data-Preparation-and-Cleaning---Geo-Data

Preparing and Cleaning Data

Language: Jupyter Notebook - Size: 26.4 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

shinanna/Tripadvisor_NLP_Analysis

NLP Analysis on Tripadvisor Restaurant Reviews

Language: Jupyter Notebook - Size: 2.72 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

chiranjeevbitm/US-House-Price-Prediction

Need to model the price of houses with the available independent variables. This model will then be used by the management to understand how exactly the prices vary with the variables. They can accordingly manipulate the strategy of the firm and concentrate on areas that will yield high returns. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

Language: Jupyter Notebook - Size: 728 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

hedata/rapid-hackathon-2018

Data preparation and logistic regression model training and testing for the Rapid Hackathon 2018 ORF challenge

Language: Jupyter Notebook - Size: 255 KB - Last synced: over 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

FlavioIsoni/Machine-Learning-Mastery-Course

Machine Learning Mastery Course (by Jason Brownlee)

Size: 1000 Bytes - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

FlavioIsoni/Bootcamp-Machine-Learning-Analyst

Bootcamp - Machine Learning Analyst / Analista de Aprendizado de Máquina (by IGTI)

Size: 385 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

prakhargurawa/Titanic-Survival-Predictor

Trying to predict survival rate of passengers using algorithms like Logistic Regression, Ada Boost, Gradient Boost , Decision Tree Classifiers , Extra Tree Classifiers , Random Forest Classifiers and XG Boost with appropriate data preprocessing techniques.

Language: Jupyter Notebook - Size: 53.7 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

anthonychristian1997/Transaction-PYTHON-DataPREPARATION-PracticeCase7

In this repository, I implement a data preparation process. Data preparation is the stage where we prepare data for machine learning processes or other things related to data analysis.

Language: Jupyter Notebook - Size: 8.79 KB - Last synced: 9 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

ankit013/Time-series-forecasting-and-sales-pipeline-prediction

Machine learning models build on real time data

Language: R - Size: 57.6 KB - Last synced: 4 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

ngupta23/data_prep_helper

A helper package for preparing and combining data from a variety of sources

Language: Python - Size: 50.8 KB - Last synced: 20 days ago - Pushed: almost 5 years ago - Stars: 0 - Forks: 0

vinayramegowda/DataPreparationPython

Data Preparation using python (Automobile Dataset)

Language: Jupyter Notebook - Size: 1.62 MB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 0 - Forks: 0

Related Keywords
datapreparation 40 data-science 11 data 9 datapreprocessing 9 python 8 datacleaning 7 logistic-regression 6 datavisualization 5 dataprep 5 feature-engineering 5 data-visualization 5 machine-learning 4 r 4 machine-learning-algorithms 4 dataprocessing 4 clustering 3 dataanalysis 3 eda 3 artificial-intelligence 2 kmeans-clustering 2 datascience 2 feature-selection 2 analytics 2 sklearn-library 2 dataexploration 2 time-series 2 statistics 2 anaconda 2 artificial-neural-networks 2 data-structures 2 scikit-learn 2 sentiment-analysis 2 jupyter-notebook 2 python3 2 business-analytics 2 pandas 2 googlecolab 2 natural-language-processing 2 exploratory-data-analysis 2 classification 2 data-exploration 2 deep-learning 2 filters 1 key-decision-makers 1 opportunities 1 production-bottlenecks 1 sales-trends 1 tableau 1 tableau-training 1 udemy 1 correlation-analysis 1 exception-handling 1 fcm 1 fraud-detection 1 fuzzy-clustering 1 novelty-detection 1 outlier-detection 1 outlier-removal 1 outliers 1 partitioning 1 pcm 1 surprise-exploration 1 examples 1 templates 1 trifacta 1 trifacta-flow 1 aggregation 1 barchart 1 dashboards 1 data-science-udemy 1 adaboost 1 decision-tree-classifiers 1 extra-tree-classifiers 1 random-forest-classifiers 1 titanic-survival-predictor 1 xg-boost 1 arima-model 1 automobile 1 ets 1 forecasting 1 hyperparameter-tuning 1 marketing 1 mlr 1 moving-average 1 r-programming 1 web-crawler-python 1 xgboost-algorithm 1 helpers 1 datascience-machinelearning 1 nlp 1 scraping 1 tripadvisor 1 wordcloud 1 featurelimination 1 lasso-regression 1 ridge-regression 1 hackathon 1 classification-algorithm 1 deep-neural-networks 1 gan 1