An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-preprocessing"

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Language: Python - Size: 5.53 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 2,842 - Forks: 533

skrub-data/skrub

Machine learning with dataframes

Language: Python - Size: 12.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,422 - Forks: 149

data-prep-kit/data-prep-kit

Open source project for data preparation for GenAI applications

Language: HTML - Size: 223 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 723 - Forks: 201

Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics

Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning

Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 629 - Forks: 112

machinelearnjs/machinelearnjs

Machine Learning library for the web and Node.

Language: TypeScript - Size: 2.76 MB - Last synced at: 16 days ago - Pushed at: 24 days ago - Stars: 542 - Forks: 53

akanz1/klib

Easy to use Python library of customized functions for cleaning and analyzing data.

Language: Python - Size: 47.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 511 - Forks: 55

Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Language: C++ - Size: 146 MB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 406 - Forks: 76

shamspias/customizable-gpt-chatbot

A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.

Language: Python - Size: 229 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 390 - Forks: 86

msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Language: Python - Size: 25.4 KB - Last synced at: 7 days ago - Pushed at: almost 3 years ago - Stars: 377 - Forks: 27

harunurrashid97/100-Days-Of-ML-Code

A day to day plan for this challenge. Covers both theoritical and practical aspects

Language: Jupyter Notebook - Size: 11.8 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 223 - Forks: 109

TirendazAcademy/PANDAS-TUTORIAL

Jupyter Notebooks and Data Sets for Pandas Library

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 220 - Forks: 177

HasnainRaz/SemSegPipeline

A simpler way of reading and augmenting image segmentation data into TensorFlow

Language: Python - Size: 41 KB - Last synced at: 12 days ago - Pushed at: about 5 years ago - Stars: 144 - Forks: 27

triton-inference-server/dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.

Language: C++ - Size: 24 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 135 - Forks: 34

thepanacealab/SMMT

Social Media Mining Toolkit (SMMT) main repository

Language: Python - Size: 521 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 134 - Forks: 37

dansuh17/segan-pytorch

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

Language: Python - Size: 82 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 108 - Forks: 32

TensorMSA/tensormsa

Deep learning GUI frame work for enterprise

Language: Python - Size: 84.3 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 108 - Forks: 18

Mohan-Zhang-u/mzutils

Language: Python - Size: 324 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 104 - Forks: 9

asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Language: Python - Size: 1.95 MB - Last synced at: 18 days ago - Pushed at: over 3 years ago - Stars: 91 - Forks: 5

wangxb96/Awesome-EdgeAI

Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"

Size: 3.64 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 87 - Forks: 8

HypoX64/candock

A time series signal analysis and classification framework

Language: Python - Size: 1.39 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 85 - Forks: 29

nursnaaz/25DaysInMachineLearning

I will update this repository to learn Machine learning with python with statistics content and materials

Language: Jupyter Notebook - Size: 293 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 57 - Forks: 66

LaureBerti/Learn2Clean

Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning

Language: Python - Size: 34.6 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 51 - Forks: 20

danielhanchen/sciblox

sciblox - Easier Data Science and Machine Learning

Language: HTML - Size: 1.38 MB - Last synced at: 7 days ago - Pushed at: almost 8 years ago - Stars: 50 - Forks: 1

soumyadip007/Data-Science-Using-Python-University-Course-Module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

Language: Jupyter Notebook - Size: 34.1 MB - Last synced at: 19 days ago - Pushed at: about 5 years ago - Stars: 46 - Forks: 46

Elysian01/Data-Purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

Language: Jupyter Notebook - Size: 7.51 MB - Last synced at: 28 days ago - Pushed at: about 3 years ago - Stars: 44 - Forks: 6

teamreboott/data-modori

Language: Python - Size: 3.56 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 3

Kukuster/SumStatsRehab

GWAS summary statistics files QC tool

Language: Python - Size: 1.87 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 40 - Forks: 6

Rpita623/Movie-Recommendation-System-using-R_Project

Movie Recommendation System: Project using R and Machine learning

Language: R - Size: 1.06 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 40 - Forks: 31

repetere/modelscript 📦

REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript

Language: JavaScript - Size: 5.73 MB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 39 - Forks: 5

mattkearns/automated-data-preprocessing

A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.

Language: Python - Size: 442 KB - Last synced at: 8 months ago - Pushed at: over 6 years ago - Stars: 34 - Forks: 15

MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 31 - Forks: 1

ELToulemonde/dataPreparation

Data preparation for data science projects.

Language: R - Size: 5.18 MB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 31 - Forks: 10

maet3608/nuts-ml

Flow-based data pre-processing for deep learning

Language: Python - Size: 67.3 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 10

hegongshan/Storage-for-AI-Paper

Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)

Size: 28.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 30 - Forks: 3

KwokHing/YandexCatBoost-Python-Demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

Language: Jupyter Notebook - Size: 743 KB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 30 - Forks: 16

TsLu1s/atlantic

Atlantic: Automated Data Preprocessing Framework for Machine Learning

Language: Python - Size: 1.94 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 29 - Forks: 4

Pooja-Bhojwani/linked-eed

Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.

Language: Python - Size: 443 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 29 - Forks: 17

Smart-Shaped/chaM3Leon

By Smart Shaped s.r.l. (https://www.smartshaped.com/)

Language: Java - Size: 899 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 28 - Forks: 2

vlivashkin/GPUParallel

Joblib-like interface for parallel GPU computations (e.g. data preprocessing)

Language: Python - Size: 111 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 2

YakshHaranwala/PTRAIL

PTRAIL is a state-of-the art parallel computation library for Mobility Data Preprocessing and feature extraction.

Language: Python - Size: 143 MB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 26 - Forks: 7

caxelos/Thesis-Project

University Thesis project

Language: C++ - Size: 67.3 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 26 - Forks: 6

nicomignoni/tab2img

A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.

Language: Python - Size: 497 KB - Last synced at: 25 days ago - Pushed at: 6 months ago - Stars: 25 - Forks: 5

twardoch/split-markdown4gpt

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

Language: Python - Size: 78.1 KB - Last synced at: 4 days ago - Pushed at: 12 days ago - Stars: 24 - Forks: 2

ipriyaaanshu/lung-cancer-detection

This is a project based on Data Science Bowl 2017. I did my best to propose a solution for the problem but I am still new to Deep Learning so my solution is not the optimal one but it can definitely be improved with some fine tuning and better resources.

Language: Jupyter Notebook - Size: 22.6 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 23 - Forks: 23

azaz9026/Medicine-Recommendation-System

A Medicine Recommendation System in machine learning (ML) is a software application designed to assist healthcare professionals and patients in selecting the most appropriate medication based on various factors such as medical history, symptoms, demographics, and drug interactions

Language: Jupyter Notebook - Size: 405 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 5

abrazinskas/machine-learning-data-pipeline

Pipeline module for parallel real-time data processing for machine learning models development and production purposes.

Size: 3.41 MB - Last synced at: 13 days ago - Pushed at: over 5 years ago - Stars: 22 - Forks: 2

lex-hue/Stock-Predictor-V4 📦

A reinforcement learning model specialized in stock prediction utilizing deep learning techniques, incorporating reward mechanisms, compatible with any machine equipped with Python.

Language: Python - Size: 148 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 8

buabaj/xplore

A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

Language: Python - Size: 1.74 MB - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 21 - Forks: 11

suraj-maniyar/Stock-Trading-Using-Machine-Learning

A comprehensive approach for stock trading implemented using Neural Network and Reinforcement Learning separately.

Language: Python - Size: 12.4 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 21 - Forks: 9

gyrdym/ml_preprocessing

Implementation of popular data preprocessing algorithms for Machine learning

Language: Dart - Size: 5.44 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 20 - Forks: 1

ammarshaikh123/Projects-on-Data-Cleaning-and-Manipulation

This repository contains projects I have worked on for Data Cleaning and Manipulation in Python.

Language: Jupyter Notebook - Size: 8.55 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 20 - Forks: 16

hemangjoshi37a/PersonalGoalAssistant

AI-driven Personal Goal Assistant: Reinforcement learning-powered software mimics user behavior, interacts with computer inputs, and autonomously achieves goals in finance, social networking, and productivity. Open-source, Python-based RL agent.

Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 19 - Forks: 2

AlexanderSouthan/pyPreprocessing

Especially useful for preprocessing of datasets like Raman spectra, infrared spectra, UV/Vis spectra, but also HPLC data and many other types of data. pyPreprocessing includes baseline correction, smoothing, filtering, normalization and transformation.

Language: Python - Size: 107 KB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 17 - Forks: 4

dataclr/dataclr

Feature selection for tabular datasets using advanced filter and wrapper methods

Language: Python - Size: 107 KB - Last synced at: 29 days ago - Pushed at: 4 months ago - Stars: 17 - Forks: 1

Vidhi1290/Deep-Learning-for-EEG-Emotion-Classification

This repository contains a Python code script for performing emotion classification using EEG (Electroencephalogram) data. Emotion classification from EEG signals is an important application in neuroscience and human-computer interaction. The code leverages deep learning techniques to analyze EEG data and predict emotional states.

Language: Jupyter Notebook - Size: 1.79 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 2

am1tyadav/teal

Library of TensorFlow layers for audio data processing and data augmentation

Language: Python - Size: 2.09 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 6

klarEDA/klar-EDA

A python library for automated exploratory data analysis

Language: Python - Size: 346 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 21

ISTE-VESIT-ORG/Machinera-2020

This is an AI Series where we will cover Machine Learning and Deep Learning topics from the very basics.

Size: 16.4 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 17 - Forks: 0

elbaulp/DPASF

My MSc on Data Science final project. This is a library for Data Pre-processing Algorithms for Streaming in Flink (DPASF)

Language: Scala - Size: 1.24 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 17 - Forks: 3

sourcecode369/ml-algorithms-on-scikit-and-keras

Implementation scripts of Machine Learning algorithms on Scikit-learn and Keras for complete novice..

Language: Jupyter Notebook - Size: 26.6 MB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 17 - Forks: 12

habedi/feature-factory

A high-performance feature engineering library for Rust powered by Apache DataFusion 🦀

Language: Rust - Size: 87.9 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 16 - Forks: 0

parvvaresh/Satellite_data

This repository provides Python code for converting satellite data into a format suitable for deep learning models. It supports various deep learning architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory networks (LSTMs).

Language: Python - Size: 15.4 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 16 - Forks: 0

Aayushpatel007/topicrankpy

A Python package to get useful information from documents using TopicRank Algorithm.

Language: Python - Size: 72.3 KB - Last synced at: 17 days ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 3

orbxball/timit-preprocessor

Extract mfcc vectors and phones from TIMIT dataset

Language: Shell - Size: 6.84 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 0

Western-OC2-Lab/MSANA-Online-Data-Stream-Analytics-And-Concept-Drift-Adaptation

Data stream analytics: Implement online learning methods to address concept drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.

Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 4

ksbg/sparklanes

A lightweight data processing framework for Apache Spark

Language: Python - Size: 185 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 5

kaviles22/EEG_SignalsClassification

Preprocessing, analysis and classification of EEG signals into 4 classes.

Language: Jupyter Notebook - Size: 812 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 6

VainF/nyuv2-python-toolkit Fork of ankurhanda/nyuv2-meta-data

nyuv2 toolbox for data extraction and loading.

Language: Python - Size: 22.5 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 15 - Forks: 2

krypticmouse/10-Days-of-Statistics-and-Data-Preprocessing

List of all the resources I used during 10 days of Statistics and Data Preprocessing.

Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 15 - Forks: 4

Ashwin-op/Machine-Learning-Series

Datasets and Codes for the ML Series

Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 15 - Forks: 2

SagarGaniga/Data-Preprocessing

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.

Language: Jupyter Notebook - Size: 422 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 15 - Forks: 21

KhaledAshrafH/ChatGPT-Sentiment-Analysis

This project aims to perform sentiment analysis on tweets related to ChatGPT, a popular language model developed by OpenAI. The dataset used for training and testing consists of 219,293 tweets collected over a month. Each tweet is classified as positive ("good"), negative ("bad"), or ("neutral").

Language: Jupyter Notebook - Size: 25.3 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 14 - Forks: 1

hectorpadin1/Network-Intrusion-Detection-System

En este proyecto se evalúan y comparan diferentes técnicas de aprendizaje automático para la detección de intrusiones en red.

Language: Jupyter Notebook - Size: 4.77 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 3

Hyprnx/used-cars-prices-prediction

This repo contains all the source code and obtained data for the used cars prices

Language: Jupyter Notebook - Size: 40.8 MB - Last synced at: 8 days ago - Pushed at: about 3 years ago - Stars: 14 - Forks: 4

Gowtham1729/Android-App-Malware-Detector

A Deep Learning Model for detecting Malware Applications

Language: Python - Size: 165 MB - Last synced at: 8 days ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 0

sharmaroshan/Numpy-and-Pandas

Numpy and Pandas are one of the most important building blocks of knowledge to get started in the field of Data Science, Analytics, Machine Learning, Business Intelligence, and Business Analytics. This Tutorial Focuses to help the Beginners to learn the core Concepts of Numpy and Pandas and get started with Machine Learning and Data Science.

Language: HTML - Size: 3.02 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 6

rbhatia46/Data-Preprocessing-Template

This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.

Language: Python - Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 13 - Forks: 10

halil/sau-ml

SAU Makine Öğrenmesi Eğitim İçerikleri

Language: Python - Size: 14.8 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 13 - Forks: 3

agrawal-priyank/Web-Scraper-Sentiment-Analysis-TripAdvisor

Academic project for Advances in Data Science and Architecture course

Language: R - Size: 167 KB - Last synced at: 29 days ago - Pushed at: over 7 years ago - Stars: 13 - Forks: 10

mikeqfu/pyhelpers

PyHelpers: An open-source toolkit for facilitating Python users' data manipulation tasks

Language: Python - Size: 8.81 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 12 - Forks: 3

CleverInsight/cognito

🚀🤖 Cognito - Simplifies AutoML Data Preprocessing.

Language: Python - Size: 950 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 11

DolbyUUU/byte_pair_encoding_BPE_subword_tokenization_implementation_python

Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python

Language: Python - Size: 449 KB - Last synced at: 7 months ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

Amshra267/Cassandra-Udyam

Contains our Approach for the competition organized at Udyam'21

Language: Jupyter Notebook - Size: 3.83 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 12 - Forks: 2

raj1603chdry/Fake-News-Detection-System

Fake News Detection System for detecting whether news is fake or not. The model is trained using "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. Link for dataset: https://arxiv.org/abs/1705.00648.

Language: Jupyter Notebook - Size: 31.3 MB - Last synced at: 20 days ago - Pushed at: over 5 years ago - Stars: 12 - Forks: 13

Bharat-Reddy/Bank-Marketing-Analysis

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit.

Language: Jupyter Notebook - Size: 2.34 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 12 - Forks: 9

MigoXLab/awesome-data-quality

A comprehensive collection of data quality resources, tools, papers, and projects across various data types including traditional data, LLM pretraining/fine-tuning data, multimodal data, and more. Essential reference for researchers and practitioners in data-centric AI.

Size: 45.9 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 11 - Forks: 2

alireza-heidarii/Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data

A real-time data cleaning pipeline for medical and healthcare data using Apache Spark, SparkNLP, Spark Streaming, and Kafka.

Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 11 - Forks: 0

AiCorsair/Dataquest-Data-Science-Analysis-Projects

A repository dedicated to storing guided projects completed while learning data science concepts with Dataquest.

Language: Jupyter Notebook - Size: 74 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 3

hxycorn/Twitter-Sentiment-Analysis-about-ChatGPT

A quantitative study on over 1.25 million tweets about ChatGPT, employed data scrapping, data cleaning, EDA, topic modeling, and sentiment analysis.

Language: Jupyter Notebook - Size: 6.06 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 6

kuleafenu/customizable-web-crawler

This web crawler can be customized to scrape almost all types of websites.

Language: Python - Size: 231 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 4

IMsumitkumar/No-code-ML-platform-DashB.ai

A no code machine learning pipelines and data visualization platform | perform with learning

Language: JavaScript - Size: 12.1 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 4

font-bakers/knead

A command line tool for preprocessing, manipulating and serializing font files for deep learning applications.

Language: Python - Size: 1.7 MB - Last synced at: 7 days ago - Pushed at: about 6 years ago - Stars: 11 - Forks: 1

amir-hojjati/Data-Analysis-Online-Retail-Transactions

This repository is created to represent the processing and the analysis that has been done on this online retail dataset.

Language: Jupyter Notebook - Size: 37.7 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 11 - Forks: 10

giusalfieri/IPA_Project

Aircraft detection in satellite images using computer vision and machine learning.

Language: C++ - Size: 273 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 10 - Forks: 0

ojasphansekar/Zillow-Home-Value-Prediction

XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 10 - Forks: 7

mattborghi/EngineML

Study notebooks made for learning machine learning for the Hawk team

Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 9 - Forks: 5

bilalhameed248/Urdu-To-English-Machine-Translation

Fine tuned Urdu to English machine translation pre train model using Hugging-Face Trainer API on custom dataset.

Language: Jupyter Notebook - Size: 60.5 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2

ThecoderPinar/House-Price-Prediction-Project

🏠 This project focuses on predicting house prices using advanced regression techniques. It involves comprehensive data preprocessing, feature engineering, and model selection. The aim is to develop an accurate predictive model for real estate prices.

Language: Jupyter Notebook - Size: 6.79 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

Yash22222/IBM-CSRBOX-Internship-Project

The objective of the Data Analytics internship at CSRBOX is to provide interns with hands-on experience in applying data analytics techniques to real-world projects in the field of corporate social responsibility (CSR). Interns will gain practical skills in data collection, cleaning, analysis, visualization, and reporting, while working on projects

Language: Jupyter Notebook - Size: 5.28 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 6

KhaledAshrafH/Logistic-Regression

This program implements logistic regression from scratch using the gradient descent algorithm in Python to predict whether customers will purchase a new car based on their age and salary.

Language: Python - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 3

Related Topics
machine-learning 656 python 489 data-visualization 397 data-science 372 data-analysis 281 pandas 211 data-cleaning 202 exploratory-data-analysis 196 deep-learning 164 feature-engineering 158 numpy 124 classification 120 scikit-learn 117 logistic-regression 110 jupyter-notebook 105 python3 100 matplotlib 96 machine-learning-algorithms 95 eda 91 seaborn 88 random-forest 75 data-preparation 74 linear-regression 71 feature-selection 68 model-evaluation 67 tensorflow 64 data 63 natural-language-processing 62 predictive-modeling 62 sklearn 62 nlp 57 data-mining 54 regression 52 data-analytics 50 hyperparameter-tuning 49 neural-networks 45 artificial-intelligence 43 clustering 41 supervised-learning 40 keras 38 data-wrangling 38 pytorch 36 visualization 36 neural-network 36 data-processing 35 regression-models 35 feature-extraction 34 decision-trees 33 streamlit 33 random-forest-classifier 31 xgboost 31 computer-vision 31 r 29 model-training 28 ai 28 sql 28 data-engineering 27 cross-validation 27 sentiment-analysis 26 decision-tree-classifier 26 unsupervised-learning 25 k-means-clustering 25 time-series-analysis 24 outlier-detection 24 time-series 23 powerbi 23 text-classification 22 lstm 22 data-exploration 22 dimensionality-reduction 21 statistical-analysis 21 decision-tree 21 cnn 21 data-collection 21 image-classification 21 image-processing 20 confusion-matrix 20 pipeline 20 gradient-boosting 20 convolutional-neural-networks 20 prediction 20 data-transformation 20 model-selection 19 data-augmentation 19 data-manipulation 19 pca 19 tableau 18 knn 18 naive-bayes-classifier 18 flask 18 web-scraping 18 statistics 18 svm 18 deep-neural-networks 16 big-data 16 data-modeling 16 matplotlib-pyplot 16 ensemble-learning 16 regression-analysis 16 ml 16