An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-preprocessing"

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Language: Python - Size: 5.53 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 2,843 - Forks: 534

skrub-data/skrub

Machine learning with dataframes

Language: Python - Size: 12.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,380 - Forks: 128

data-prep-kit/data-prep-kit

Open source project for data preparation of LLM application builders

Language: HTML - Size: 220 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 646 - Forks: 193

Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics

Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning

Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 629 - Forks: 112

machinelearnjs/machinelearnjs

Machine Learning library for the web and Node.

Language: TypeScript - Size: 2.76 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 542 - Forks: 53

akanz1/klib

Easy to use Python library of customized functions for cleaning and analyzing data.

Language: Python - Size: 47.2 MB - Last synced at: 2 days ago - Pushed at: 10 days ago - Stars: 511 - Forks: 55

Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Language: C++ - Size: 143 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 401 - Forks: 76

shamspias/customizable-gpt-chatbot

A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.

Language: Python - Size: 229 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 389 - Forks: 84

msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Language: Python - Size: 25.4 KB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 377 - Forks: 27

harunurrashid97/100-Days-Of-ML-Code

A day to day plan for this challenge. Covers both theoritical and practical aspects

Language: Jupyter Notebook - Size: 11.8 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 223 - Forks: 109

TirendazAcademy/PANDAS-TUTORIAL

Jupyter Notebooks and Data Sets for Pandas Library

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 209 - Forks: 175

HasnainRaz/SemSegPipeline

A simpler way of reading and augmenting image segmentation data into TensorFlow

Language: Python - Size: 41 KB - Last synced at: 10 days ago - Pushed at: almost 5 years ago - Stars: 144 - Forks: 27

thepanacealab/SMMT

Social Media Mining Toolkit (SMMT) main repository

Language: Python - Size: 521 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 134 - Forks: 37

triton-inference-server/dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.

Language: C++ - Size: 24 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 133 - Forks: 33

dansuh17/segan-pytorch

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

Language: Python - Size: 82 KB - Last synced at: 25 days ago - Pushed at: about 6 years ago - Stars: 108 - Forks: 32

TensorMSA/tensormsa

Deep learning GUI frame work for enterprise

Language: Python - Size: 84.3 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 108 - Forks: 18

Mohan-Zhang-u/mzutils

Language: Python - Size: 324 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 104 - Forks: 9

asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Language: Python - Size: 1.95 MB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 90 - Forks: 5

HypoX64/candock

A time series signal analysis and classification framework

Language: Python - Size: 1.39 MB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 85 - Forks: 29

wangxb96/Awesome-EdgeAI

Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"

Size: 3.64 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 83 - Forks: 8

nursnaaz/25DaysInMachineLearning

I will update this repository to learn Machine learning with python with statistics content and materials

Language: Jupyter Notebook - Size: 293 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 57 - Forks: 66

LaureBerti/Learn2Clean

Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning

Language: Python - Size: 34.6 MB - Last synced at: 30 days ago - Pushed at: over 2 years ago - Stars: 51 - Forks: 20

danielhanchen/sciblox

sciblox - Easier Data Science and Machine Learning

Language: HTML - Size: 1.38 MB - Last synced at: 11 days ago - Pushed at: almost 8 years ago - Stars: 50 - Forks: 1

soumyadip007/Data-Science-Using-Python-University-Course-Module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

Language: Jupyter Notebook - Size: 34.1 MB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 45 - Forks: 46

Elysian01/Data-Purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

Language: Jupyter Notebook - Size: 7.51 MB - Last synced at: 28 days ago - Pushed at: about 3 years ago - Stars: 44 - Forks: 6

teamreboott/data-modori

Language: Python - Size: 3.56 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 3

Rpita623/Movie-Recommendation-System-using-R_Project

Movie Recommendation System: Project using R and Machine learning

Language: R - Size: 1.06 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 40 - Forks: 31

repetere/modelscript 📦

REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript

Language: JavaScript - Size: 5.73 MB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 39 - Forks: 5

Kukuster/SumStatsRehab

GWAS summary statistics files QC tool

Language: Python - Size: 1.87 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 38 - Forks: 6

mattkearns/automated-data-preprocessing

A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.

Language: Python - Size: 442 KB - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 34 - Forks: 15

ELToulemonde/dataPreparation

Data preparation for data science projects.

Language: R - Size: 5.18 MB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 31 - Forks: 10

maet3608/nuts-ml

Flow-based data pre-processing for deep learning

Language: Python - Size: 67.3 MB - Last synced at: about 23 hours ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 10

KwokHing/YandexCatBoost-Python-Demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

Language: Jupyter Notebook - Size: 743 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 30 - Forks: 16

Pooja-Bhojwani/linked-eed

Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.

Language: Python - Size: 443 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 29 - Forks: 17

TsLu1s/atlantic

Atlantic: Automated Data Preprocessing Framework for Machine Learning

Language: Python - Size: 1.94 MB - Last synced at: 29 days ago - Pushed at: 4 months ago - Stars: 28 - Forks: 4

vlivashkin/GPUParallel

Joblib-like interface for parallel GPU computations (e.g. data preprocessing)

Language: Python - Size: 111 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 28 - Forks: 2

caxelos/Thesis-Project

University Thesis project

Language: C++ - Size: 67.3 MB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 6

MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 25 - Forks: 1

nicomignoni/tab2img

A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.

Language: Python - Size: 497 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 25 - Forks: 5

YakshHaranwala/PTRAIL

PTRAIL is a state-of-the art parallel computation library for Mobility Data Preprocessing and feature extraction.

Language: Python - Size: 143 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 25 - Forks: 7

twardoch/split-markdown4gpt

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

Language: Python - Size: 78.1 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 24 - Forks: 2

ipriyaaanshu/lung-cancer-detection

This is a project based on Data Science Bowl 2017. I did my best to propose a solution for the problem but I am still new to Deep Learning so my solution is not the optimal one but it can definitely be improved with some fine tuning and better resources.

Language: Jupyter Notebook - Size: 22.6 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 23 - Forks: 23

azaz9026/Medicine-Recommendation-System

A Medicine Recommendation System in machine learning (ML) is a software application designed to assist healthcare professionals and patients in selecting the most appropriate medication based on various factors such as medical history, symptoms, demographics, and drug interactions

Language: Jupyter Notebook - Size: 405 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 5

abrazinskas/machine-learning-data-pipeline

Pipeline module for parallel real-time data processing for machine learning models development and production purposes.

Size: 3.41 MB - Last synced at: 23 days ago - Pushed at: over 5 years ago - Stars: 22 - Forks: 2

lex-hue/Stock-Predictor-V4 📦

A reinforcement learning model specialized in stock prediction utilizing deep learning techniques, incorporating reward mechanisms, compatible with any machine equipped with Python.

Language: Python - Size: 148 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 21 - Forks: 8

buabaj/xplore

A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

Language: Python - Size: 1.74 MB - Last synced at: about 10 hours ago - Pushed at: about 4 years ago - Stars: 21 - Forks: 11

suraj-maniyar/Stock-Trading-Using-Machine-Learning

A comprehensive approach for stock trading implemented using Neural Network and Reinforcement Learning separately.

Language: Python - Size: 12.4 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 21 - Forks: 9

DataFog/datafog-python

Open source PII detection and anonymization tool: easy-to-use, configurable, and extensible

Language: Python - Size: 78.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 20 - Forks: 5

gyrdym/ml_preprocessing

Implementation of popular data preprocessing algorithms for Machine learning

Language: Dart - Size: 5.44 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 20 - Forks: 1

ammarshaikh123/Projects-on-Data-Cleaning-and-Manipulation

This repository contains projects I have worked on for Data Cleaning and Manipulation in Python.

Language: Jupyter Notebook - Size: 8.55 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 20 - Forks: 16

hemangjoshi37a/PersonalGoalAssistant

AI-driven Personal Goal Assistant: Reinforcement learning-powered software mimics user behavior, interacts with computer inputs, and autonomously achieves goals in finance, social networking, and productivity. Open-source, Python-based RL agent.

Language: Python - Size: 20.5 KB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 19 - Forks: 2

AlexanderSouthan/pyPreprocessing

Especially useful for preprocessing of datasets like Raman spectra, infrared spectra, UV/Vis spectra, but also HPLC data and many other types of data. pyPreprocessing includes baseline correction, smoothing, filtering, normalization and transformation.

Language: Python - Size: 104 KB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 17 - Forks: 4

dataclr/dataclr

Feature selection for tabular datasets using advanced filter and wrapper methods

Language: Python - Size: 107 KB - Last synced at: 30 days ago - Pushed at: 2 months ago - Stars: 17 - Forks: 1

Vidhi1290/Deep-Learning-for-EEG-Emotion-Classification

This repository contains a Python code script for performing emotion classification using EEG (Electroencephalogram) data. Emotion classification from EEG signals is an important application in neuroscience and human-computer interaction. The code leverages deep learning techniques to analyze EEG data and predict emotional states.

Language: Jupyter Notebook - Size: 1.79 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 2

am1tyadav/teal

Library of TensorFlow layers for audio data processing and data augmentation

Language: Python - Size: 2.09 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 6

klarEDA/klar-EDA

A python library for automated exploratory data analysis

Language: Python - Size: 346 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 21

ISTE-VESIT-ORG/Machinera-2020

This is an AI Series where we will cover Machine Learning and Deep Learning topics from the very basics.

Size: 16.4 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 17 - Forks: 0

elbaulp/DPASF

My MSc on Data Science final project. This is a library for Data Pre-processing Algorithms for Streaming in Flink (DPASF)

Language: Scala - Size: 1.24 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 17 - Forks: 3

sourcecode369/ml-algorithms-on-scikit-and-keras

Implementation scripts of Machine Learning algorithms on Scikit-learn and Keras for complete novice..

Language: Jupyter Notebook - Size: 26.6 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 17 - Forks: 12

parvvaresh/Satellite_data

This repository provides Python code for converting satellite data into a format suitable for deep learning models. It supports various deep learning architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory networks (LSTMs).

Language: Python - Size: 15.4 MB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 16 - Forks: 0

Aayushpatel007/topicrankpy

A Python package to get useful information from documents using TopicRank Algorithm.

Language: Python - Size: 72.3 KB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 3

Western-OC2-Lab/MSANA-Online-Data-Stream-Analytics-And-Concept-Drift-Adaptation

Data stream analytics: Implement online learning methods to address concept drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.

Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 4

ksbg/sparklanes

A lightweight data processing framework for Apache Spark

Language: Python - Size: 185 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 5

hegongshan/Storage-for-AI-Paper

Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)

Size: 16.6 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 15 - Forks: 2

habedi/feature-factory

A high-performance feature engineering library for Rust powered by Apache DataFusion 🦀

Language: Rust - Size: 87.9 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 15 - Forks: 0

kaviles22/EEG_SignalsClassification

Preprocessing, analysis and classification of EEG signals into 4 classes.

Language: Jupyter Notebook - Size: 812 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 6

orbxball/timit-preprocessor

Extract mfcc vectors and phones from TIMIT dataset

Language: Shell - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 0

VainF/nyuv2-python-toolkit Fork of ankurhanda/nyuv2-meta-data

nyuv2 toolbox for data extraction and loading.

Language: Python - Size: 22.5 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 15 - Forks: 2

Hyprnx/used-cars-prices-prediction

This repo contains all the source code and obtained data for the used cars prices

Language: Jupyter Notebook - Size: 40.8 MB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 15 - Forks: 4

krypticmouse/10-Days-of-Statistics-and-Data-Preprocessing

List of all the resources I used during 10 days of Statistics and Data Preprocessing.

Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 15 - Forks: 4

Ashwin-op/Machine-Learning-Series

Datasets and Codes for the ML Series

Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 15 - Forks: 2

SagarGaniga/Data-Preprocessing

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.

Language: Jupyter Notebook - Size: 422 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 15 - Forks: 21

KhaledAshrafH/ChatGPT-Sentiment-Analysis

This project aims to perform sentiment analysis on tweets related to ChatGPT, a popular language model developed by OpenAI. The dataset used for training and testing consists of 219,293 tweets collected over a month. Each tweet is classified as positive ("good"), negative ("bad"), or ("neutral").

Language: Jupyter Notebook - Size: 25.3 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 14 - Forks: 1

hectorpadin1/Network-Intrusion-Detection-System

En este proyecto se evalúan y comparan diferentes técnicas de aprendizaje automático para la detección de intrusiones en red.

Language: Jupyter Notebook - Size: 4.77 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 3

Gowtham1729/Android-App-Malware-Detector

A Deep Learning Model for detecting Malware Applications

Language: Python - Size: 165 MB - Last synced at: 6 days ago - Pushed at: almost 4 years ago - Stars: 14 - Forks: 0

sharmaroshan/Numpy-and-Pandas

Numpy and Pandas are one of the most important building blocks of knowledge to get started in the field of Data Science, Analytics, Machine Learning, Business Intelligence, and Business Analytics. This Tutorial Focuses to help the Beginners to learn the core Concepts of Numpy and Pandas and get started with Machine Learning and Data Science.

Language: HTML - Size: 3.02 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 13 - Forks: 6

halil/sau-ml

SAU Makine Öğrenmesi Eğitim İçerikleri

Language: Python - Size: 14.8 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 13 - Forks: 3

mikeqfu/pyhelpers

PyHelpers: An open-source toolkit for facilitating Python users' data manipulation tasks

Language: Python - Size: 8.75 MB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 3

CleverInsight/cognito

🚀🤖 Cognito - Simplifies AutoML Data Preprocessing.

Language: Python - Size: 950 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 11

DolbyUUU/byte_pair_encoding_BPE_subword_tokenization_implementation_python

Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python

Language: Python - Size: 449 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

Amshra267/Cassandra-Udyam

Contains our Approach for the competition organized at Udyam'21

Language: Jupyter Notebook - Size: 3.83 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 12 - Forks: 2

Bharat-Reddy/Bank-Marketing-Analysis

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit.

Language: Jupyter Notebook - Size: 2.34 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 12 - Forks: 9

rbhatia46/Data-Preprocessing-Template

This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.

Language: Python - Size: 1.95 KB - Last synced at: 30 days ago - Pushed at: almost 7 years ago - Stars: 12 - Forks: 10

alireza-heidarii/Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data

A real-time data cleaning pipeline for medical and healthcare data using Apache Spark, SparkNLP, Spark Streaming, and Kafka.

Language: Python - Size: 11.7 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 0

AiCorsair/Dataquest-Data-Science-Analysis-Projects

A repository dedicated to storing guided projects completed while learning data science concepts with Dataquest.

Language: Jupyter Notebook - Size: 74 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 11 - Forks: 3

hxycorn/Twitter-Sentiment-Analysis-about-ChatGPT

A quantitative study on over 1.25 million tweets about ChatGPT, employed data scrapping, data cleaning, EDA, topic modeling, and sentiment analysis.

Language: Jupyter Notebook - Size: 6.06 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 11 - Forks: 6

kuleafenu/customizable-web-crawler

This web crawler can be customized to scrape almost all types of websites.

Language: Python - Size: 231 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 4

IMsumitkumar/No-code-ML-platform-DashB.ai

A no code machine learning pipelines and data visualization platform | perform with learning

Language: JavaScript - Size: 12.1 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 4

font-bakers/knead

A command line tool for preprocessing, manipulating and serializing font files for deep learning applications.

Language: Python - Size: 1.7 MB - Last synced at: 9 days ago - Pushed at: almost 6 years ago - Stars: 11 - Forks: 1

amir-hojjati/Data-Analysis-Online-Retail-Transactions

This repository is created to represent the processing and the analysis that has been done on this online retail dataset.

Language: Jupyter Notebook - Size: 37.7 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 11 - Forks: 10

giusalfieri/IPA_Project

Aircraft detection in satellite images using computer vision and machine learning.

Language: C++ - Size: 273 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 10 - Forks: 0

ojasphansekar/Zillow-Home-Value-Prediction

XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 10 - Forks: 7

mattborghi/EngineML

Study notebooks made for learning machine learning for the Hawk team

Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 9 - Forks: 5

bilalhameed248/Urdu-To-English-Machine-Translation

Fine tuned Urdu to English machine translation pre train model using Hugging-Face Trainer API on custom dataset.

Language: Jupyter Notebook - Size: 60.5 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2

ThecoderPinar/House-Price-Prediction-Project

🏠 This project focuses on predicting house prices using advanced regression techniques. It involves comprehensive data preprocessing, feature engineering, and model selection. The aim is to develop an accurate predictive model for real estate prices.

Language: Jupyter Notebook - Size: 6.79 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

Yash22222/IBM-CSRBOX-Internship-Project

The objective of the Data Analytics internship at CSRBOX is to provide interns with hands-on experience in applying data analytics techniques to real-world projects in the field of corporate social responsibility (CSR). Interns will gain practical skills in data collection, cleaning, analysis, visualization, and reporting, while working on projects

Language: Jupyter Notebook - Size: 5.28 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 6

KhaledAshrafH/Logistic-Regression

This program implements logistic regression from scratch using the gradient descent algorithm in Python to predict whether customers will purchase a new car based on their age and salary.

Language: Python - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 3

alotlikar1010/PW-Skills-Data-Master-Assignment

Assignment Solution of PW Skills Data Master Course

Language: Jupyter Notebook - Size: 1.77 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 5

basiralab/Kaggle-BrainNetPrediction-Toolbox

A Python toolbox for predicting brain network (graph) evolution over time from a single observation. The codes of the 20 competing Kaggle teams along with the competition datasets are made available.

Language: Python - Size: 4.92 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 3

d4rk-lucif3r/LuciferML 📦

Semi-Auto Machine Learning Library by d4rk-lucif3r

Language: Python - Size: 2.52 MB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 6

Related Topics
machine-learning 611 python 463 data-visualization 365 data-science 349 data-analysis 258 pandas 200 exploratory-data-analysis 183 data-cleaning 178 deep-learning 159 feature-engineering 148 numpy 119 classification 108 logistic-regression 101 scikit-learn 99 python3 97 jupyter-notebook 96 matplotlib 94 eda 87 machine-learning-algorithms 86 seaborn 85 data-preparation 73 random-forest 70 feature-selection 68 linear-regression 63 model-evaluation 63 sklearn 60 data 59 natural-language-processing 58 tensorflow 58 predictive-modeling 58 nlp 56 regression 48 data-analytics 47 hyperparameter-tuning 46 data-mining 46 neural-networks 44 supervised-learning 39 clustering 37 artificial-intelligence 37 keras 36 regression-models 35 data-wrangling 34 pytorch 34 visualization 33 feature-extraction 32 neural-network 32 streamlit 32 data-processing 31 decision-trees 30 random-forest-classifier 29 computer-vision 28 ai 28 model-training 26 k-means-clustering 25 xgboost 25 data-engineering 24 decision-tree-classifier 24 sql 24 r 24 unsupervised-learning 24 cross-validation 24 time-series-analysis 23 sentiment-analysis 23 powerbi 22 decision-tree 21 outlier-detection 21 data-exploration 21 text-classification 21 cnn 21 statistical-analysis 20 image-classification 20 dimensionality-reduction 20 time-series 20 gradient-boosting 20 data-collection 20 pipeline 19 lstm 19 convolutional-neural-networks 19 data-augmentation 18 data-manipulation 18 image-processing 18 prediction 18 knn 18 pca 18 model-selection 18 data-transformation 17 tableau 17 web-scraping 17 statistics 17 flask 17 confusion-matrix 17 ensemble-learning 16 naive-bayes-classifier 16 regression-analysis 16 plotly 16 preprocessing 15 deep-neural-networks 15 matplotlib-pyplot 15 feature-scaling 15 smote 15