An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: datapreprocessing

MadhukarSaiBabu/House-Price-prediction

This project aims to build a predictive model using ML and regression to estimate house prices, identify key value drivers, and support informed decision-making for stakeholders in real estate transactions.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

cereja-project/cereja

Cereja is a bundle of useful functions we don't want to rewrite and .. just pure fun!

Language: Python - Size: 743 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 27 - Forks: 11

mri177/Vehicle-Curb-Weight-Prediction-using-Linear-Regressionon-Model

DS-M2

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

divithraju/divith-aju-Hadoop-Pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

Language: Python - Size: 4.88 KB - Last synced at: 13 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

sanjay-ar/payment_default_prediction

Payment Default Prediction System is a machine learning pipeline that forecasts the likelihood of payment defaults using historical transaction data and client profiles. Designed to support risk assessment in finance, the system uses classification models to flag high-risk clients for early intervention.

Language: Python - Size: 143 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ShovalBenjer/Power_Transform_Box-Cox_Supervised_ML

This project explores machine learning techniques, focusing on data preprocessing, model building, and evaluation. It includes data analysis, visualization, various algorithms, and performance comparison. Key topics: data cleaning, feature engineering, model selection, hyperparameter tuning, and evaluation metrics.

Language: Jupyter Notebook - Size: 7.67 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 1

SridharYadav07/Image_Segmentation-for-Disaster-Resilience

Image Segmentation for Disaster Resilience is a deep learning project developed for the FloodNet Challenge, focused on leveraging semantic segmentation to assist in flood impact analysis. Using a U-Net architecture, the model segments aerial imagery to detect key features such as flooded buildings, roads, water bodies, vegetation, and more.

Language: Jupyter Notebook - Size: 824 KB - Last synced at: 14 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

SeekAI-786/Electricity_Theft_Detection

Electricity theft is a major issue in regions like Karachi, where unauthorized consumption of electricity leads to significant losses for utility companies. This project provides a solution for detecting electricity theft using machine learning models. By analyzing various factors such as electricity usage, voltage fluctuations, and historical data

Language: Python - Size: 3.3 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Shriyaak/MachineLearning.studyjournal.1

This repository contains my study and practice of key machine learning concepts, including:

Language: Jupyter Notebook - Size: 239 KB - Last synced at: 14 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

burhanahmed1/CryptoSynth

Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis

Language: Jupyter Notebook - Size: 3.54 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

sandhiya0147/Data_Science

In this repository, I push my works, tasks and notes learned in my data science specialization.

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ErdemOzgen/Data-Engineering-Roadmap

Roadmap for Data Engineering

Language: Java - Size: 1.98 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 225 - Forks: 30

ENGRZULQARNAIN/ScrapySub

ScrapySub is a Python library designed to recursively scrape website content, including subpages. It fetches the visible text from web pages and stores it in a structured format for easy access and analysis. This library is particularly useful for NLP and AI developers who need to gather large amounts of web content for their projects.

Language: Python - Size: 25.4 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

MidoHossam14/Data-.S-.Tools-Project

Web Scraping Project

Language: Jupyter Notebook - Size: 4.69 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

MidoHossam14/DataMining-FinalProject

Hands on Data Mining & Analytics Algorithms

Language: Jupyter Notebook - Size: 458 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Zahit2121/Machine-Learning-Templates-for-Data-Cleaning-Data-Visualization-Data-Preprocessing-Model-Training

This repository offers templates for machine learning tasks, focusing on data cleaning, visualization, preprocessing, and model training. Each template provides clear steps and code snippets to streamline your workflow and improve project efficiency.

Language: Jupyter Notebook - Size: 1.96 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

MDalamin5/Capstone-Project-Adaptive-Tutoring-System-AI-Based-All-Experimental-Resources

This project is an AI-powered algebra tutor using the Phi-3 Mini model. It provides personalized learning through interactive chat, adapting to the student's level and offering detailed step-by-step solutions. Built with Streamlit for an engaging educational experience.

Language: Python - Size: 10.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Machine-Learning-Related-Projects/Real-Fake-Job-Post

Real-Fake-Job-Post

Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ImAliShaikh/Machine-Learning-Templates-for-Data-Cleaning-Data-Visualization-Data-Preprocessing-Model-Training

Language: Jupyter Notebook - Size: 2.08 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PrinceMandhar/Machine-learning

A collection of hands-on Machine Learning projects using Python, covering classification, regression, deep learning, and data analysis with real-world datasets.

Language: HTML - Size: 2.53 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

MohammedSaim-Quadri/Intrusion_Detection-System

This project is an Intrusion Detection System (IDS) using machine learning (ML) and deep learning (DL) to detect network intrusions. It leverages the CICIDS2018 dataset to classify traffic as normal or malicious. Key features include data preprocessing, model training, hyperparameter tuning, and Docker containerization for scalable deployment.

Language: Python - Size: 8.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 0

MDalamin5/Data-Science-Machine-Learning-Basics

This repository is a comprehensive guide to Machine Learning algorithms, Python OOP, data preprocessing, and visualization using Pandas, NumPy, Seaborn, Scikit-learn, and more. It includes hands-on Jupyter notebooks, modular Python scripts, and a structured ML pipeline for training and evaluating models. 🚀

Language: Jupyter Notebook - Size: 48.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tashi-2004/Apache-Airflow-Kafka-Spark-DeltaLake-Real-Time-Stream-Pipeline

This project implements a real-time data pipeline using Apache Airflow, Kafka, Apache Spark, and Delta Lake. It supports both batch (Coldpath) and real-time (Hotpath) data ingestion, processing, and storage. Airflow is used for orchestrating the data workflows.

Language: Python - Size: 12.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

mhmmdrzkya2000/DigitalSkillFair38_Data_Science_2025

Titanic EDA - Explanatory Data Analysis Repository ini merupakan hasil pelatihan selama 1 minggu dari DigitalSkillFair38 Data Science yang saya ikuti bersama Dibimbing.id berfokus materi tentang proses Explanatory Data Analysis (EDA) terhadap datasheet Titanic

Language: Jupyter Notebook - Size: 8.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ChenTaHung/Monotonic-Optimal-Binning

Monotonic Optimal Binning algorithm is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.

Language: Python - Size: 7.44 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 2

cesardushime/GenAI_Full-Course

official Generative AI Mastery GitHub repository – your one-stop resource for mastering the practical and theoretical aspects of Generative AI.

Language: Jupyter Notebook - Size: 57.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

farzeennimran/AI_Recipe_Generator

A simple AI recipe generator using ML and DL models 🍔🍨🍷

Language: Python - Size: 5.51 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

jigyasaG18/Fake-News-Prediction-App

The Fake News Prediction App Repository offers a machine learning project that focuses on identifying the authenticity of news articles as fake or real. It uses a dataset of 20,000 articles and employs methods such as TF-IDF vectorization and the Lemmatization algorithm, achieving ~95% classification accuracy with random forest classifier model

Language: Jupyter Notebook - Size: 45.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

45Harry/Potato_Disease_Classification

Potato Disease Classification - Traning, Rest Api and FrontEnd to Test

Language: Jupyter Notebook - Size: 55.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

itsjakup/YouTube_Comments_Sentiment_Analysis

YouTube Comments Sentiment Analysis classifies user comments as positive, negative, or neutral. This helps creators understand audience reactions and improve content. By using text preprocessing, TF-IDF for feature extraction, and models like Naive Bayes, we can effectively gauge viewer sentiments and gain valuable insights.

Language: Jupyter Notebook - Size: 1.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

nafisalawalidris/Fraud-Detection-with-Supervised-Learning

This repository contains a basic fraud detection system utilising supervised learning techniques to identify potentially fraudulent credit card transactions. The project establishes a baseline model that addresses the challenges of credit card fraud in financial institutions.

Language: Jupyter Notebook - Size: 4.76 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 7 - Forks: 1

BatthulaVinay/Predictive-Maintenance-for-Industrial-Equipment

This project focuses on Predictive Maintenance for industrial equipment using machine learning. The goal is to predict potential machine failures before they occur, enabling proactive maintenance and reducing downtime.

Language: Jupyter Notebook - Size: 1.21 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

jigyasaG18/Multiple-Disease-Detection-App

This repository contains the implementation of a Multiple Disease Detection System, which employs advanced machine learning techniques for early detection and prediction of prevalent diseases, including diabetes, heart disease, and Parkinson's disease. The system utilizes a variety of patient health metrics such as demographics and medical history.

Language: Jupyter Notebook - Size: 71.3 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Barathkalyan/GoldVision-price-predictor

GoldVision is a time series forecasting project that predicts gold price trends using Facebook Prophet. The project incorporates custom seasonality and advanced visualization techniques to provide accurate forecasts and insights.

Language: Python - Size: 75.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Krishnasarathy/SDA-Student-Dropout-Analysis-

In India, particularly in states like Gujarat, school dropout rates are a major concern. To help address this issue, we developed a machine learning-based solution that analyzes the key factors behind student dropouts.

Language: JavaScript - Size: 10.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

benzerinsio/SeattleWeather-PowerBI

📊 Análise interativa de dados climáticos de Seattle com Power BI, usando dados preprocessados em SQLite para explorar temperaturas, precipitação e tipos de clima.

Size: 34.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

yadavkaushal/DataScience-E-Commerce-shopping-details

This project analyzes customer purchase data including details such as location, company, credit card usage, browser info, job roles and purchase price. It explores patterns in payment methods, spending behavior and online transactions. Using Pandas, Matplotlib and Seaborn, we clean analyze and visualize key trends to derive actionable insights.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

IngestAI/embedditor

⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.

Language: PHP - Size: 1.74 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 224 - Forks: 15

Munawar-code/car_price_predictor

This project is a ML-based car price prediction system. The model is built using Jupyter Notebook for training and evaluation, while a simple one-page website was developed using Pycharm to provide interface for users to input car details and get price predictions.

Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Shimul-Zahan/All-Practices-TukiTaki

This is repository for all the practice tasks or learning new things. Cause environment are setup and no need to setup a new project or environments.

Language: Python - Size: 5.44 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Vikas-B-S/Customer_Churn_Analysis

A machine learning model for predicting customer churn using Python. Features data preprocessing, multiple classification models (Logistic Regression, Random Forest, XGBoost), and performance evaluation. Built with Pandas, Scikit-learn, and Seaborn for visualization.

Language: Jupyter Notebook - Size: 223 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sehbakhan/Hackathon_xto10x

Develop a high-performance ML model to predict airline profitability and provide actionable insights for operational optimization.

Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

zxurie/Fuel-Efficiency-Prediction--Python

This is a learning project for practice, using regression on the Auto MPG dataset to predict fuel efficiency. It involves building a Neural Network with RMSprop, MSE loss, data normalization, and early stopping to prevent overfitting. Key: normalize data, use MAE/MSE, optimize model.

Size: 1.95 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sadikurrahamanfahim/DataVize

Web Based application with various operations for data science

Language: Python - Size: 18.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

pjaiswalusf/Stroke-Prediction

This project leverages machine learning to predict stroke risk using XGBoost, Random Forest, and Logistic Regression. It incorporates advanced data preprocessing, class imbalance handling with SMOTE, and hyperparameter optimization using Optuna. Model interpretability is enhanced with SHAP to identify key risk factors.

Language: Jupyter Notebook - Size: 4.69 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

FaisalAhmed21/CSE422-Artificial-Intelligence

Language: Jupyter Notebook - Size: 16.2 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

vishallmaurya/VEDA

veda_lib-A Python library designed to streamline the transition from raw data to machine learning models. It automates and simplifies data preprocessing, cleaning, and balancing, addressing the time-consuming and complex aspects of these tasks to provide clean and ready-to-use data.

Language: Python - Size: 59 MB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Shoaib-Akther-Asif/Country-wise-Quality-of-Life-Overview

Data scraping with Selenium & visualizing the results through interactive dashboards in Tableau Public.

Language: Jupyter Notebook - Size: 1.14 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BatthulaVinay/House-Price-prediction

This project aims to analyze and predict house prices based on various features such as location, size, and amenities. The dataset is processed and explored using Python, and machine learning models are applied to generate accurate price predictions.

Language: Jupyter Notebook - Size: 1.07 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Sameer6690/Sexism_Detection

This project focuses on the classification of sexist and non-sexist language using three machine learning models. The models are Logistic Regression, Support Vector Machine (SVM) and Neural Network which were used after performing preprocessing and feature extraction of the dataset.

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

yash-rewalia/Stock-closing-price-prediction-using-regression

The ultimate business objective is to leverage the regression model to provide accurate predictions of the closing price of AMRN stock, enabling stakeholders to make well-informed investment decisions, manage risks effectively, optimize portfolios, Early warning systems to alert any fraud cases and align investment strategies with financial goals.

Language: Jupyter Notebook - Size: 7.58 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Prarthana-Singh/WhatsApp-Chat-Analyzer

Language: Python - Size: 14.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BatthulaVinay/EV-population

This repository contains a Jupyter Notebook focused on analyzing Electric Vehicle (EV) population data. The notebook includes data visualizations, exploratory analysis, and key insights.

Language: Jupyter Notebook - Size: 843 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Rajitha0411/ML

The Global Superstore dataset is a comprehensive collection of sales data spanning multiple years, regions, and product categories. This rich dataset encapsulates critical business metrics including sales revenue, profit, order quantity, and shipping cost, making it ideal for various data analysis and machine learning projects.

Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BatthulaVinay/Basic-Linear-Regression

This project demonstrates Basic Linear Regression using Python. The notebook includes dataset loading, exploratory data analysis, model training, evaluation, and visualization of results.

Language: Jupyter Notebook - Size: 122 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BatthulaVinay/phone-usage-analysis

This project analyzes phone usage patterns in India and predicts the primary use of mobile devices based on various features. The notebook covers data preprocessing, exploratory data analysis (EDA), and model training using multiple classification algorithms.

Language: Jupyter Notebook - Size: 1.29 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

zeeza18/Movie-Recommendation-System

A Movie Recommendation System using machine learning to suggest movies based on user preferences. Built with Python 3.7 and utilizes the TMDB movie metadata for recommendations.

Language: Jupyter Notebook - Size: 1.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

MDalamin5/Machine-Learning-2.0

Machine-Learning-2.0: A comprehensive repository documenting my journey to master ML from scratch. It includes core algorithms, advanced techniques, data preprocessing, feature engineering, and real-world projects. Follow my structured approach, inspired by "100 Days of ML," featuring Python implementations, tools, and insightful resources.

Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Safwan2003/RandomForest_Heart_Disease_Prediction

A machine learning project using Random Forest Classifier to predict heart disease. Includes data preprocessing (with binning), feature selection, and model evaluation.

Language: Jupyter Notebook - Size: 4.86 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Gurpreet0022/Crop-Fertilizers-Recommendation-System-using-ML-

This repository is a part of AICTE - Shell Internship on 'Green Skills using AI technologies' Cycle 3.

Language: Jupyter Notebook - Size: 1.28 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

manishkolla/Movie-Genre-Recognition-from-Posters

CNN-based method for movie genre classification from posters, with data pre-processing (one-hot encoding, missing values, imbalance, resizing). Performance compared to LeNet, AlexNet, VGG, ResNet-50, Logistic Regression, and Random Forest.

Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

manishkolla/Zillow-Home-Value-Prediction

To address the impact of rising house prices on the economy, we built a machine learning model resistant to market trends. We experimented with Random Forest and Linear Regression models, employing sophisticated imputation methods like median state price replacement, KNN imputation, and forward/backward filling to minimize errors.

Language: Jupyter Notebook - Size: 9.29 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

MonzerDev/Fake-News-Detection

A robust fake news detection system leveraging machine learning models (SVM and Random Forest) to identify political misinformation. Includes preprocessing, training, and evaluation scripts with datasets available for download.

Language: Jupyter Notebook - Size: 284 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

NickLitwinow/Kaggle-Titanic

This project tackles the Kaggle Titanic challenge. The objective is to build a predictive model to determine which passengers survived the Titanic disaster based on various features such as passenger class, age, gender, and other attributes. The project covers data cleaning, preprocessing, exploratory data analysis, and model building.

Language: Jupyter Notebook - Size: 468 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

addytrunks/machine-learning

A comprehensive repository documenting key machine learning algorithms, implementation details, and practical examples.

Language: Jupyter Notebook - Size: 140 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

dineshkumarkotha/Impact-of-Alcohol-Consumption-on-Public-Health

Impact of Alcohol Consumption on Public Health

Size: 227 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

BatthulaVinay/Divorce-Status-Prediction-EDA-and-ML

This project focuses on analyzing data related to divorce status to uncover insights, trends, and predictive models. The analysis is conducted using Python in a Jupyter Notebook environment.

Language: Jupyter Notebook - Size: 689 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

BatthulaVinay/GenZ_DatingApp-EDA-and-ML

This project focuses on analyzing data from a GenZ Dating App to uncover insights, trends, and predictive models. The analysis is conducted using Python in a Jupyter Notebook environment.

Language: Jupyter Notebook - Size: 332 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

AryanPatill/Fuel-Efficiency-Prediction--Python

This is a learning project for practice, using regression on the Auto MPG dataset to predict fuel efficiency. It involves building a Neural Network with RMSprop, MSE loss, data normalization, and early stopping to prevent overfitting. Key: normalize data, use MAE/MSE, optimize model.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

robbytbg/Port2

Portfolio Project

Language: Jupyter Notebook - Size: 10 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Rutuja0732/EDA-using-Titanic-Dataset

Here, I have performed Exploratory Data Analysis (EDA) on the Titanic dataset from Kaggle to learn and gain a deeper understanding of EDA concepts.

Language: Python - Size: 218 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

AntoinePinto/StringPairFinder

Algorithm designed to match strings by similarity

Language: Python - Size: 333 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

sarojinisharon/Predicting-Credit-Card-Approvals

In this project built a machine learning model to predict credit card approval using a dataset. It involves data preprocessing, handling missing values, encoding categorical variables, and applying logistic regression. The model is optimized with hyperparameter tuning using GridSearchCV to improve performance.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

MOULALIMS/Book-Recommendation

The Book Recommendation System is a machine learning project that leverages collaborative filtering techniques to provide personalized book suggestions to users.

Language: Jupyter Notebook - Size: 157 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

AyushAI/NextEra-Supplies-A-Business-Analytical-Case-Study

This project demonstrates an end-to-end data analytics workflow for NextEra Supplies, combining MySQL for database management, Power BI for dynamic visualization, and Python for data exploration preprocessing. Advanced deep learning techniques (LSTM) are used for accurate sales forecasting, providing actionable insights to drive strategic decision.

Language: Jupyter Notebook - Size: 22.8 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

rohancodestack/ML-Model-Deployment-on-AWS-SageMaker

Designed and deployed a scalable machine learning pipeline on AWS to detect fraudulent transactions, leveraging SageMaker for model deployment, real-time inference, and feedback-based retraining. Ensured secure data handling with S3 and tenant isolation for a multi-tenant SaaS LMS application.

Language: Python - Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

MvMukesh/autoKYC

Named Entity Extraction with OpenCV, Pytesseract, Spacy (OCR + NER), BIO Labelling

Size: 3.27 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

Ansuman21/Predicting_Vehicle_Weight_with_Linear-Regression

Data Science Project (K-Fold Cross Validation M2)

Language: Jupyter Notebook - Size: 2.28 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

cliffordtutu/Excel-Project-Coffee-Sales-Analysis

The Coffee Sales Analysis Excel Project is a practical exploration of sales data analysis using Microsoft Excel. This project showcases how Excel can be a powerful tool for data cleaning, preprocessing, visualization, and dashboard creation, all within a familiar spreadsheet environment.

Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

tanisha10101/Machine-Learning-LAB

This repository contains experiments on data visualization, cleaning, preprocessing, and machine learning techniques. It covers topics like prediction models, reinforcement learning, SVM, and data sampling to provide hands-on experience in data science and machine learning.

Size: 1.25 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Priyanshu7639/Data_Visualization_Dashboard

An Interactive data visualization tool that combines traditional plotting capabilities with modern AI assistance. It allows users to create and modify visualizations through natural language commands, making data exploration accessible to users of all skill levels.

Language: Python - Size: 15.6 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Kawai-Senpai/UltraClean

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.

Language: Python - Size: 38.1 KB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

snowfela/SDV

Mini Project about synthetic data generation by implementing CTGAN algorithm on tabular data

Language: Python - Size: 1.86 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Ganga-Suren/NFL-Running-Back-Analysis

Analyzing NFL running back performance using historical data and machine learning models.

Language: PHP - Size: 9.45 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Muhammad-Sheraz-ds/Predicting-Insurance-Claim

In this comprehensive machine learning project, I executed the entire machine learning life cycle. Designed a streamlined and visually appealing interface using Streamlit. Ensuring a user-friendly experience for individuals to input their relevant information effortlessly. Handed off well-documented and easily modifiable code.

Language: Jupyter Notebook - Size: 74.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

sarahloree/Project-3--Credit-Card-User-Churn-Prediction

This is the third project I completed as part of the Advanced Machine Learning module from my post-graduate certification in AI/ Machine Learning from University of Texas' McCombs School of Business.

Language: Jupyter Notebook - Size: 3.56 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

m-karthik-kumar/Personalized-Dietary-Guidance-with-Gen-AI

The algorithm utilizes Generative AI and Natural Language Processing (NLP) to analyze the nutritional content of packaged food products. The system considers personalized health conditions, such as allergies and dietary needs, to provide tailored recommendations, helping individuals make safer and more informed food choices.

Language: Jupyter Notebook - Size: 170 KB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

Michael-Insights/Portfolio

This repository showcases my projects and skills in Data Analytics, Data Science, and Machine Learning. It includes hands-on work in data analysis, predictive modeling, and machine learning algorithms, aimed at solving real-world problems.

Language: Jupyter Notebook - Size: 210 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

lijesh010/ML_Project_Car_Price_Prediction_using_LinearRegression

This repository presents a data-driven exploration into predicting car prices using a machine learning model based on linear regression, aimed at aiding a Chinese automobile company's entry into the competitive US market.

Language: Jupyter Notebook - Size: 5.51 MB - Last synced at: 19 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

hk151109/House-Price-Prediction_Using-Linear-Regression

This repository contains the implementation of a Linear Regression model to predict house prices based on features such as square footage, number of bedrooms, and bathrooms.

Language: Python - Size: 8.59 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

pavankethavath/Car_dekho_car_price_prediction

A Streamlit web app utilizing Python, scikit-learn, and pandas for used car price prediction. Features data preprocessing (scaling, encoding), Random Forest model optimization with GridSearchCV, and interactive user input handling. Achieves high accuracy (R² score: 0.9028), showcasing skills in machine learning, data engineering, and deployment.

Language: Python - Size: 182 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

jigyasaG18/Movie-Recommendation-System-Project

This repository features a personalized movie recommendation system that offers tailored suggestions to users. It leverages a dataset of 5,000 English-language films and utilizes data processing, feature engineering, and a cosine similarity algorithm to analyze user preferences. The system includes an intuitive user interface for easy navigation.

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

jigyasaG18/Power-BI-Dashboard-Project

The Ecommerce Sales Analysis Dashboard project utilizes Power BI to provide detailed insights into ecommerce sales data, enabling stakeholders to track key performance metrics and uncover trends. This interactive dashboard allows users to explore the data in real-time, offering features such as drill-down capabilities, customizable filters.

Size: 155 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jigyasaG18/HR-Analytics-Power-BI-Dashboard

The HR Analytics Power BI Dashboard project focuses on developing a comprehensive tool to analyze and visualize key performance indicators related to employee attrition and retention. It features interactive visualizations that enable HR professionals to explore data, identify trends, and make informed decisions. The dashboard integrates the data.

Size: 17.5 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

jigyasaG18/Fake-News-Prediction-Project

The Fake News Prediction App Repository offers a machine learning project that focuses on identifying the authenticity of news articles as fake or real. It uses a dataset of 20,000 articles and employs methods such as TF-IDF vectorization and the Porter stemming algorithm, achieving around 97% classification accuracy with logistic regression model.

Language: Jupyter Notebook - Size: 47.7 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

jigyasaG18/Financial-Risk-Analysis-Project

The Credit Card Financial Risk Analysis Dashboard is a real-time Power BI tool designed to provide insights into credit card transactions and customer demographics. It features interactive visualizations, efficient data processing, and actionable insights to support decision-making. Utilizing data from SQL database, the dashboard tracks key metrics

Size: 2.54 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

jigyasaG18/Data-Analysis-using-MS-Excel

This project is on analyzing real-time data from Ambuvians Healthcare, a health products startup. It included data cleaning, such as removing duplicates and addressing missing values, followed by analyses to reveal insights into sales trends, customer demographics, and purchasing behaviors. Visualizations in MS-Excel including bar and pie charts.

Size: 3.11 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

bibek36/Dialogue-Summarization-with-Generative-AI

Welcome to the Dialogue Summarization with Generative AI project! In this project, your main goal is to perform dialogue summarization using cutting-edge language models and investigate how different input techniques impact the quality of generated summaries.

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

GIPSYDANGER-1/PreppyData

We provide Auto Data Preprocessing for anyone

Language: HTML - Size: 67.4 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

panastasiadis/data-mining-operations

This repository contains three Knime workflows that aim to analyze the Air Traffic Passenger Statistics dataset from the San Francisco International Airport. The workflows include tasks such as classification comparison, regression analysis, and outlier detection using various machine learning techniques.

Size: 739 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Related Keywords
datapreprocessing 416 python 142 machine-learning 135 data-science 87 data 59 datavisualization 58 datacleaning 57 data-visualization 57 pandas 56 exploratory-data-analysis 50 machine-learning-algorithms 49 dataanalysis 41 feature-engineering 36 numpy 36 matplotlib 32 linear-regression 32 python3 32 jupyter-notebook 30 eda 30 deep-learning 29 logistic-regression 28 random-forest 25 machinelearning 24 regression 23 nlp 22 datascience 21 classification 19 visualization 19 sklearn 19 data-analysis 19 natural-language-processing 18 seaborn 18 tensorflow 17 scikit-learn 16 streamlit 16 decision-trees 15 r 14 modelevaluation 14 datapreparation 14 neural-network 13 feature-selection 13 xgboost 13 random-forest-classifier 10 feature-extraction 10 naive-bayes-classifier 9 predictive-modeling 9 regression-models 9 kmeans-clustering 9 modeltraining 9 sentiment-analysis 9 statistics 9 nlp-machine-learning 9 prediction 9 powerbi 9 dataset 9 dataprocessing 8 artificial-intelligence 8 model 8 cross-validation 8 datacollection 8 ml 8 modelbuilding 8 clustering 8 hyperparameter-tuning 8 database 8 matplotlib-pyplot 7 decision-tree-classifier 7 dataanalytics 7 knn-classification 7 datawrangling 7 flask 7 keras 7 dashboard 7 classification-algorithm 7 sql 7 excel 7 outlier-detection 7 normalization 6 unsupervised-learning 6 supervised-machine-learning 6 datamining 6 svm 6 tableau 6 pandas-python 6 pytorch 6 nltk 6 computer-vision 5 datamanipulation 5 webscraping 5 pivot-tables 5 ai 5 hyperparameter-optimization 5 supervised-learning 5 sklearn-library 5 cnn 5 svm-classifier 5 statistical-analysis 5 streamlit-webapp 5 randomforest 5 dataexploration 5