An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-preparation

skrub-data/skrub

Machine learning with dataframes

Language: Python - Size: 12.4 MB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 1,360 - Forks: 121

NVIDIA/NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Language: Jupyter Notebook - Size: 7.66 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 879 - Forks: 124

MaheenNaaz9150/Task-1-Data-Cleaning-and-Preprocessing

Size: 2.13 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

ITRoselloSignoris/Movie-Recommendation-Model

Movie Recommendation Model - Personal Project

Language: Jupyter Notebook - Size: 5.71 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

data-prep-kit/data-prep-kit

Open source project for data preparation of LLM application builders

Language: HTML - Size: 219 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 622 - Forks: 193

HaivuUK/lua-regression

A lualatex package for adding different polynomial regressions to graphs. Additionally calculates R Squared and confidence intervals.

Language: TeX - Size: 1.1 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 1,503 - Forks: 232

jackieocham/rest-metrics-data-analysis

Data analysis on sleep and health tracking data collected over many years

Language: SQL - Size: 72.3 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Ishmal793/Lists-Tuples-Dictionaries-JSON-Sets

Beginner-friendly Python practice covering core collection types like lists, tuples, dictionaries, sets, and JSON with real-world problems.

Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

18520339/finding-similar-images

Finding similar images from image URLs using ImageHash

Language: Python - Size: 1.72 MB - Last synced at: 8 days ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 2

RashadGarayev/Image-ClassificationNN

Image classification svm with simple neural network.

Language: Python - Size: 3.52 MB - Last synced at: 4 days ago - Pushed at: almost 5 years ago - Stars: 9 - Forks: 1

sergezaugg/xeno_canto_organizer

A python tool to prepare Xeno-Canto audio files for machine learning projects

Language: Python - Size: 1.75 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

PacktWorkshops/The-Data-Science-Workshop

A New, Interactive Approach to Learning Data Science

Language: Jupyter Notebook - Size: 169 MB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 226 - Forks: 218

Kukuster/SumStatsRehab

GWAS summary statistics files QC tool

Language: Python - Size: 1.87 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 38 - Forks: 6

imarranz/data-science-workflow-management

This repository is a collection of code, documentation, and other resources that support the management and automation of a Data Science project.

Language: Makefile - Size: 42 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 2 - Forks: 0

hegongshan/Storage-for-AI-Paper

Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)

Size: 14.6 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 11 - Forks: 1

arkapatra31/ML

Learning and Implementation of my Machine Learning Journey

Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

datacorner/dataprep-handbook

Time to get your data sorted! The Data Preparation Handbook, published by Manning within the MEAP release, is the go-to guide for handling messy data. All the book's code and resources can be found here.

Language: HTML - Size: 44.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

ndomah1/Data-Cleaning-in-MySQL

This project cleans and standardizes a global dataset of tech layoffs using MySQL, transforming raw data into an analysis-ready format.

Size: 211 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

labrijisaad/Prediction-du-cours-de-Bourse

Forecast Apple stock prices using Python, machine learning, and time series analysis. Compare performance of four models for comprehensive analysis and prediction.

Language: Jupyter Notebook - Size: 4.74 MB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

hi-primus/bumblebee

🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)

Language: Vue - Size: 23 MB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 141 - Forks: 35

RezaMoammadi/Book-Data-Science

If you're eager to explore data science, data analysis, and machine learning, 'Uncovering Data Science with R' is the perfect starting point. This book offers a clear, hands-on introduction to the field, requiring no prior experience in analytics or programming.

Language: HTML - Size: 103 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

ksm26/Pretraining-LLMs

Master the essential steps of pretraining large language models (LLMs). Learn to create high-quality datasets, configure model architectures, execute training runs, and assess model performance for efficient and effective LLM pretraining.

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 26 days ago - Pushed at: 9 months ago - Stars: 13 - Forks: 5

manishdevdi/Instacart-Market-Basket-Analysis

The objective of this project is to analyze the 3 million grocery orders from more than 200,000 Instacart users and predict which previously purchased item will be in user's next order. Customer segmentationty analysis are done to study customer purchase patterns and for better product marketing and cro and affiniss-selling.

Language: Jupyter Notebook - Size: 6.59 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

ndomah/The-Data-Engineering-Academy

Materials from The Data Engineering Academy

Size: 18.5 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

developmentseed/label-maker

Data Preparation for Satellite Machine Learning

Language: Python - Size: 18.8 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 465 - Forks: 111

dataclr/dataclr

Feature selection for tabular datasets using advanced filter and wrapper methods

Language: Python - Size: 107 KB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 1

MattBlue00/polimi-thesis

Research Thesis Project at Politecnico di Milano, A.Y. 2023-2024

Language: Python - Size: 9.23 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

therichkid77/sheet2ai

sheet2ai is a Google Apps Script program designed to help users make their Google Sheets understandable by Large Language Models (LLMs), such as ChatGPT. It solves the problem of directly uploading a Google Sheet to an LLM, which often leads to confusion or misinterpretation by the AI.

Language: JavaScript - Size: 14.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jackmnob/Python-Tableau-EDA-StockDash

Data cleaning, preparation, and manipulation (EDA) for an interactive stock market dashboard with Tableau - using pandas (Python) via JupyterLab

Language: Jupyter Notebook - Size: 503 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

francesco-pastori/effects-of-data-preparation-on-algorithms

Analyzing the effect of data preparation on different algorithms, introducing different problem inside the dataset

Language: Jupyter Notebook - Size: 8.73 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ChaitanyaC22/Investment-Analysis-for-an-Asset-Management-Company

Data analysis to identify the best sectors, countries, and a suitable investment type for making investments.

Language: Jupyter Notebook - Size: 6.78 MB - Last synced at: 27 days ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

sbcgua/mockup_loader

ABAP unit testing framework, prepare in Excel, reuse in abap code

Language: ABAP - Size: 992 KB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 68 - Forks: 16

SwapnaleeNikam/Forage-TATA-Data-Visualization

This is a Virtual internship programme in which we will be using Excel, Tableau and Microsoft PPT for data cleaning, data analyzing, data visualization and creating data insights to answer business related questions.

Size: 31.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ndomah/1.-The-Basics

1. The Basics from The Data Engineering Academy

Language: Python - Size: 14.4 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dataiku/dss-plugin-timeseries-preparation

This Dataiku DSS plugin provides visual recipes to perform resampling, windowing, interval extraction, extrema extraction, and decomposition on time series data.

Language: Python - Size: 665 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 5

anquetos/jop2024-offre-culturelle

Language: Jupyter Notebook - Size: 1.69 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

LKEthridge/SDA_Project

A Statistical Data Analysis project from TripleTen

Language: Jupyter Notebook - Size: 2.8 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Muneeb1030/FineTune-Tiny-Llama

Fine-tuning the Tiny Llama model to mimic my professor's writing style using the Llama Factory. The project involves data collection, preprocessing, preparation, fine-tuning, and evaluation.

Language: Jupyter Notebook - Size: 390 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

proxyflux/living-history

Information extracting rule-based Text classifier and Name Entity Recognition to draw relations among entities

Language: Python - Size: 14.4 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

AiCorsair/Dataquest-Data-Science-Analysis-Projects

A repository dedicated to storing guided projects completed while learning data science concepts with Dataquest.

Language: Jupyter Notebook - Size: 74 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 3

Abhi-Pat/Text-Data-Analsis-Youtube-Case-Study-

This repository provides a comprehensive analysis of YouTube comments and related data, leveraging sentiment analysis, emoji usage, word cloud generation, and various graphical visualizations. Key steps include: Data Preparation, Sentiment, WordCloud, Emoji, Data Collection & Analysis

Language: Jupyter Notebook - Size: 3.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SatyaCoder29/Superstore-Sales-Analysis-

Analyzed Superstore Sales Data to uncover trends, optimize sales, and improve profitability. Explored customer segments, regional performance, and product categories using Python and Power BI. Delivered actionable insights to enhance revenue, streamline inventory, and refine marketing strategies, driving data-informed decision-making.

Size: 5.07 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Language: Python - Size: 1.95 MB - Last synced at: about 10 hours ago - Pushed at: over 3 years ago - Stars: 90 - Forks: 5

ITRoselloSignoris/Fraud-Detection-and-Prevention-Model

Final Project for Edvai´s Data Science & MLOps Bootcamp

Language: Jupyter Notebook - Size: 1.49 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

davsdb/platemeter-to-ML

Research project carried out in collaboration with the University of Tuscia

Language: Python - Size: 16.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

tigureis/House-Rent-Analysis

House Rent Data Cleaning and Preparation: Clean and preprocess house rent data for further analysis.

Language: Jupyter Notebook - Size: 524 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Amitreddy14/2019-Election-Analysis-and-Swing-Prediction-Model

This project analyzes voter behavior in India's 2019 general election, identifying patterns across demographics, economic conditions, and social factors using statistical methods and machine learning. By assessing regional disparities and government policies, we aim to elucidate India's democratic process and improve election outcome forecasting.

Language: Python - Size: 477 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

alice-patrick/E-Commerce-Healthcare-Orders-Dataset-Data-Analysis-Using-Python

This project cleans and preprocesses an e-commerce orders dataset, focusing on healthcare-related orders. It provides visual insights into sales trends, customer behavior, delivery performance, and product popularity, comparing healthcare products with non-healthcare ones.

Language: Python - Size: 6.84 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

mituskillologies/data-science-sep24

Programs of Data Science batch @ MITU Skillologies, September 2024

Language: Jupyter Notebook - Size: 2.1 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

hawra-nawi/Statistical-Modelling-of-Factors-Influencing-European-Football-Players-Potential-and-Wages

Explore the world of European football through comprehensive quantitative analysis, uncovering valuable insights into player attributes, potential, and wage determinants.

Language: HTML - Size: 14.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

soumyadip007/Data-Science-Using-Python-University-Course-Module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

Language: Jupyter Notebook - Size: 34.1 MB - Last synced at: 20 days ago - Pushed at: about 5 years ago - Stars: 45 - Forks: 46

HuuPhat842/DataUnderstanding_Python_DataWrangling

Using Python, this project performs exploratory data analysis (EDA), data cleaning, and generates insights, including identifying top-performing products, evaluating team performance, and categorizing transaction types based on specific criteria. Designed for robust data quality checks and business insight extraction.

Language: Jupyter Notebook - Size: 2.7 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

nragland37/Event-Optimization-Tool

R-based Shiny application that maps availability and identifies optimal engagement times to enhance participation within an organization

Language: R - Size: 32.7 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

AntBit96/Dataform_PoC

Template for basic data preparation

Language: JavaScript - Size: 25.4 KB - Last synced at: 22 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

maksymsur/spltr 📦

`Spltr` is a simple PyTorch-based data loader and splitter. It may be used to load arrays and matrices or Pandas DataFrames and CSV files containing numerical data with subsequent split it into train, test (validation) subsets in the form of PyTorch DataLoader objects.

Language: Python - Size: 99.6 KB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

lahmacunradio/analytics

Utils for analytics

Language: Python - Size: 176 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Florian-Katerndahl/ForesTiler

Create Image Tiles From Large Input Rasters According to a Classified Mask Vector File

Language: Python - Size: 52.7 KB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Gracysapra/R-in-data-Science

This repository contains essential guides for data analysis using R, covering topics like data preparation, data reshaping, and data visualization. Each file focuses on fundamental techniques to manipulate, clean, and visualize data effectively using R programming.

Language: Jupyter Notebook - Size: 40 KB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

kozodoi/dptools

Python package with utilities for data processing, aggregation, feature engineering and data versioning

Language: Python - Size: 108 KB - Last synced at: 10 days ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 2

furk4neg3/Sales-Forecasting

Created AI models to forecast Wallmart's sales. Used different models, like dense, LSTM, GRU and naive model. Different window and horizon sizes are used too. Compared models visually at the end.

Language: Jupyter Notebook - Size: 448 KB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

Niranjana2599/PwC-Switzerland-Power-BI-Virtual-Case-Experience

The tasks involve data cleaning, analysis, and creating interactive dashboards to present actionable business insights using Power BI.

Size: 2.45 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

terilios/automated_data_scientist

Automated Data Scientist: An intelligent, adaptive data analysis tool that leverages AI-driven automation to dynamically plan, execute, and refine data science workflows. Automatically handles data preparation, analysis planning, code generation, and result interpretation using advanced language models.

Language: Python - Size: 207 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 1

nadahamdy217/Movies-Data-ETL-using-Python-GCP

Developed a comprehensive ETL pipeline for movie data using Python, Docker, and a GCP Pub/Sub emulator. Successfully processed and published the data in a local Docker environment, showcasing advanced data engineering skills.

Language: Python - Size: 1.17 MB - Last synced at: 27 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

shettyvarshaa/ML-LAB

Machine Learning Lab Programs in the curriculum

Language: Python - Size: 749 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 1

Chaitanya1436/Student_Performance_Analysis

A project focused on analyzing college student performance using data on department, assessment scores, and performance labels. Implemented in Google Colab, the analysis includes data preprocessing, feature scaling, and exploratory data analysis to uncover insights and prepare the data for further analysis or modeling.

Language: Jupyter Notebook - Size: 127 KB - Last synced at: 16 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

jrrobison1/novel-ai-module-tools

Tools for preparing text for AI module generation in NovelAI. Includes formatting, text analysis, named entity recognition, and name replacement functionalities.

Language: Python - Size: 950 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

juliataborek/data-preparation

Size: 4.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

GithubAamna/SQL-Data-Cleaning---A-Beginner-Project

Used SQL on mySQL Workbench to clean data and visualized the results on Tableau.

Size: 99.6 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

salehjg/Shapenet2_Preparation

A python script to convert and down-sample mesh data into pointclouds using FPS algorithm.

Language: Python - Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 16 - Forks: 0

niladrridas/Supervised-Learning

Gain a comprehensive understanding of supervised learning techniques.

Size: 1.3 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

subhanjandas/Data-Cleaning-and-Preparation-of-Boston-Housing-Dataset---Python-Pandas

This project involves analysis of the Boston Housing Dataset using Python's Pandas library. Data cleaning is performed by dropping genuine outliers, resetting the index, and imputing missing values with the median of the columns. It is substituted with NaN for further analysis. The objective of this project is to clean and prepare the data

Language: HTML - Size: 1.12 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 1

MelvinJWallace/MelvinJW.github.io

A portfolio of a host of projects completed using python and sql.

Language: CSS - Size: 8.56 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

LeftCoastNerdGirl/Excel_crowdfunding_analysis

This project demonstrates the use of MS Excel for data cleansing & formatting to prepare for data analysis and visualization.

Size: 407 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

M-Fatoni/Improving-Employee-Retention-by-Predicting-Employee-Attrition-Using-Machine-Learning

This project aims to leverage machine learning techniques to predict employee attrition, allowing organizations to identify at-risk employees and implement strategies to improve retention rates.

Language: Jupyter Notebook - Size: 1000 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

M-Fatoni/Predict-Clicked-Ads-Customer-Classification-by-using-Machine-Learning

This project aims to classify customers who are likely to click on ads using machine learning techniques. By predicting customer behavior, businesses can optimize their ad targeting strategies, resulting in improved ad performance and increased return on investment (ROI).

Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

M-Fatoni/Predict-Customer-Personality-to-boost-marketing-campaign-by-using-Machine-Learning

This project aims to enhance marketing campaign effectiveness by predicting customer personalities using machine learning techniques. By understanding customer personality traits, businesses can tailor their marketing strategies to better meet the needs and preferences of their target audience.

Language: Jupyter Notebook - Size: 2.45 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

Putriarrum/Predict-Customer-Personality-to-Boost-Up-Marketing-Campaign-Performance

This project is my personal project about Marketing Campaign using dataset from Big Tech Company provided by Rakamain Academy. I created Clustering Model with Python (Sklearn) to get best model from the dataset that can used for arrange their next marketing strategic planning.

Language: Jupyter Notebook - Size: 2.49 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

ArchAngelAries/TagScribeR

A tool to streamline AI image captioning

Language: Python - Size: 190 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

AtikaAnjum/PRODIGY_ML_01

This repository includes a report about implementing a Linear Regression model to predict house prices using square footage, number of bedrooms, and number of bathrooms. The model demonstrated reliable performance and successfully predicted house prices for new input data.

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

vineet416/EDA-Travel

EDA Travel data by PW Skills Data Analytics Course.

Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

rubypoddar/HappinessScoreVisualizer

visualizing and analyzing global happiness scores using data visualization techniques and statistical tests.

Language: Jupyter Notebook - Size: 779 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

tirthgala/Data-Science-For-Business

This repository contains learning in Data Science for Business course while pursuing my Master's in Quantitative Management- Business Analytics program at Fuqua School of Business

Language: R - Size: 5.92 MB - Last synced at: 10 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

jmarihawkins/neural-network-challenge-2

The purpose of this project is to develop a machine learning model that predicts employee attrition (whether an employee will leave the company) and department assignment (which department an employee belongs to) based on various factors. These factors include age, travel frequency, education level, job satisfaction, marital status, and more.

Language: Jupyter Notebook - Size: 35.2 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

wlodpawlowski/machine-learning-basic-datasets

Repository which consists different code snippets and projects for my personal lessons recorded at the University of San Francisco in during learning of the Machine Learning.

Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

Naga-Manohar-Y/Global_Football_Transfer_Market_Analysis_

Delve into a decade of football's financial maneuvers across Europe's top leagues, uncovering strategic insights behind record-breaking transfers.

Language: Jupyter Notebook - Size: 36.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Da-Rupal/Amazon_Sales_Dashbord

Exhilarated to share my 1st Project with Unified Mentor Private Limited as Data Analyst Intern.

Size: 4.27 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

StrangeCoder1729/Happy-vs-Sad-Image-Classification

This project uses a Convolutional Neural Network (CNN) to classify images as happy or sad. It includes data preprocessing, model training, and prediction on new images.

Language: Jupyter Notebook - Size: 144 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

sai-praseeda-atluri/PRODIGY_ML_02

This repository contains a project for customer segmentation using the K-Means clustering algorithm. The goal is to group customers into distinct clusters based on their demographic and purchasing behavior to better understand customer segments and target them effectively.

Size: 9.77 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jmarihawkins/neural-network-challenge-1

The purpose of this project is to predict student loan repayment success using a neural network. Neural networks are computational models inspired by the human brain's structure and function, consisting of layers of interconnected nodes or "neurons" that can learn to recognize patterns in data.

Language: Jupyter Notebook - Size: 36.1 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

JyotiVGupta/8-Week-SQL-Challenge-Case-Study-1

SQL solution to the Case Study #1 - Danny's Diner

Size: 3.91 KB - Last synced at: 5 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

dataeducator/movie-studios-viability-project Fork of learn-co-curriculum/dsc-phase-1-project-v2-4

This repository contains a Phase 1 Project for the Data Science Flex Program at the Flatiron School. This project uses sqlite3, pandas, numpy and exploratory data analysis using matplotlib and seaborn to analyze and discuss features of profitable movies.

Language: Jupyter Notebook - Size: 132 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

Sof-AI/fanfiction_project

A passion project focused on analyzing my own readinglists & fanworks hosted on Archive Of Our Own!

Language: Jupyter Notebook - Size: 48.4 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

hitesh22rana/sourcecollector

A simple tool to consolidate multiple files into a single .txt file. Perfect for feeding your files to AI tools without any fuss.

Language: Go - Size: 10.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

anujonthemove/keras-retinanet-utilities

Data preparation utilities for keras-retinanet

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: 11 months ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

Rahma-Farag/Rahma-Farag

Main Repository

Language: Jupyter Notebook - Size: 71.4 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

chahelgupta/DEP-videogames-dataset

The data extraction and processing involved thorough exploration, preprocessing, and visualization of the "Video Game Sales with Ratings" dataset.

Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

cevheryilmaz/Honey_Production_in_the_USA_in_Machine_Learning

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

DataRish/MBTI-Personality-Predictor

This project predicts MBTI personality types from users' recent 50 posts using NLP and ML techniques.

Language: Jupyter Notebook - Size: 24.3 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Chan-dre-yi/industry-4.0-exploratory-data-analysis

An exploratory data analysis of an Industry 4.0 dataset uncovered insights indicating that Business Intelligence and IoT systems will have the greatest impact in the field over the next decade.

Language: MATLAB - Size: 1.32 MB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Related Keywords
data-preparation 315 python 78 machine-learning 78 data-preprocessing 73 data-science 67 data-analysis 66 data-visualization 60 data-cleaning 52 pandas 34 exploratory-data-analysis 30 deep-learning 22 feature-engineering 22 classification 19 numpy 18 data 18 data-wrangling 17 sql 16 matplotlib 16 data-processing 15 python3 15 logistic-regression 14 r 14 seaborn 14 eda 13 scikit-learn 12 machine-learning-algorithms 11 tableau 10 random-forest 10 predictive-modeling 9 clustering 9 jupyter-notebook 9 regression 9 linear-regression 9 tensorflow 9 data-mining 8 data-manipulation 8 statistics 8 data-analytics 8 feature-selection 7 nlp 7 neural-network 7 dataset 7 visualization 7 image-processing 7 data-cleansing 7 excel 7 statistical-analysis 6 neural-networks 6 feature-extraction 6 data-collection 6 opencv 6 data-transformation 6 artificial-intelligence 6 data-engineering 6 datasets 5 time-series-analysis 5 text-processing 5 data-exploration 5 plotly 5 docker 5 pca 5 supervised-learning 5 data-visualisation 5 data-quality 5 dashboard 5 keras 5 data-modeling 4 analytics 4 sklearn 4 preprocessing 4 image-classification 4 pytorch 4 train-test-split 4 deep-neural-networks 4 random-forest-classifier 4 model-training-and-evaluation 4 svm-classifier 4 powerbi 4 web-scraping 4 natural-language-processing 4 decision-tree-classifier 4 hypothesis-testing 4 analysis 4 named-entity-recognition 4 data-normalization 4 streamlit 4 ml 4 missing-values 4 decission-tree 4 pipeline 4 computer-vision 4 data-prep 4 large-language-models 4 sentiment-analysis 4 mysql 4 business-intelligence 3 business-analysis 3 imputation 3 tableau-public 3 webscraping 3