An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-preprocessing"

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Language: Python - Size: 5.53 MB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 2,842 - Forks: 533

skrub-data/skrub

Machine learning with dataframes

Language: Python - Size: 14.3 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1,533 - Forks: 185

data-prep-kit/data-prep-kit

Open source project for data preparation for GenAI applications

Language: HTML - Size: 245 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 867 - Forks: 232

Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics

Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning

Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 629 - Forks: 112

machinelearnjs/machinelearnjs

Machine Learning library for the web and Node.

Language: TypeScript - Size: 2.94 MB - Last synced at: 3 days ago - Pushed at: 28 days ago - Stars: 542 - Forks: 54

akanz1/klib

Easy to use Python library of customized functions for cleaning and analyzing data.

Language: Python - Size: 47.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 520 - Forks: 56

Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Language: C++ - Size: 152 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 453 - Forks: 87

shamspias/customizable-gpt-chatbot

A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.

Language: Python - Size: 229 KB - Last synced at: 7 months ago - Pushed at: almost 2 years ago - Stars: 390 - Forks: 86

msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Language: Python - Size: 25.4 KB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 378 - Forks: 27

harunurrashid97/100-Days-Of-ML-Code

A day to day plan for this challenge. Covers both theoritical and practical aspects

Language: Jupyter Notebook - Size: 11.8 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 228 - Forks: 111

TirendazAcademy/PANDAS-TUTORIAL

Jupyter Notebooks and Data Sets for Pandas Library

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 220 - Forks: 177

HasnainRaz/SemSegPipeline

A simpler way of reading and augmenting image segmentation data into TensorFlow

Language: Python - Size: 41 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 144 - Forks: 27

triton-inference-server/dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.

Language: C++ - Size: 24 MB - Last synced at: about 22 hours ago - Pushed at: 2 days ago - Stars: 139 - Forks: 33

thepanacealab/SMMT

Social Media Mining Toolkit (SMMT) main repository

Language: Python - Size: 521 KB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 137 - Forks: 37

dansuh17/segan-pytorch

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

Language: Python - Size: 82 KB - Last synced at: 8 months ago - Pushed at: almost 7 years ago - Stars: 108 - Forks: 32

TensorMSA/tensormsa

Deep learning GUI frame work for enterprise

Language: Python - Size: 84.3 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 108 - Forks: 18

Mohan-Zhang-u/mzutils

Language: Python - Size: 324 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 104 - Forks: 9

wangxb96/Awesome-EdgeAI

Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"

Size: 3.64 MB - Last synced at: 16 days ago - Pushed at: 3 months ago - Stars: 98 - Forks: 11

asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Language: Python - Size: 1.95 MB - Last synced at: 25 days ago - Pushed at: about 4 years ago - Stars: 93 - Forks: 5

HypoX64/candock

A time series signal analysis and classification framework

Language: Python - Size: 1.39 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 85 - Forks: 29

nursnaaz/25DaysInMachineLearning

I will update this repository to learn Machine learning with python with statistics content and materials

Language: Jupyter Notebook - Size: 293 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 57 - Forks: 66

hegongshan/Storage-for-AI-Paper

Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)

Size: 29.3 KB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 56 - Forks: 5

LaureBerti/Learn2Clean

Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning

Language: Python - Size: 34.6 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 52 - Forks: 20

danielhanchen/sciblox

sciblox - Easier Data Science and Machine Learning

Language: HTML - Size: 1.38 MB - Last synced at: 3 months ago - Pushed at: over 8 years ago - Stars: 50 - Forks: 1

soumyadip007/Data-Science-Using-Python-University-Course-Module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

Language: Jupyter Notebook - Size: 34.1 MB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 46 - Forks: 46

Elysian01/Data-Purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

Language: Jupyter Notebook - Size: 7.51 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 45 - Forks: 6

Kukuster/SumStatsRehab

GWAS summary statistics files QC tool

Language: Python - Size: 1.87 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 40 - Forks: 6

teamreboott/data-modori

Language: Python - Size: 3.56 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 40 - Forks: 3

Rpita623/Movie-Recommendation-System-using-R_Project

Movie Recommendation System: Project using R and Machine learning

Language: R - Size: 1.06 MB - Last synced at: 9 months ago - Pushed at: about 4 years ago - Stars: 40 - Forks: 31

repetere/modelscript 📦

REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript

Language: JavaScript - Size: 5.73 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 39 - Forks: 5

mattkearns/automated-data-preprocessing

A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.

Language: Python - Size: 442 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 34 - Forks: 15

ELToulemonde/dataPreparation

Data preparation for data science projects.

Language: R - Size: 5.22 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 31 - Forks: 10

MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 31 - Forks: 1

maet3608/nuts-ml

Flow-based data pre-processing for deep learning

Language: Python - Size: 67.3 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 31 - Forks: 10

Smart-Shaped/chaM3Leon

By Smart Shaped s.r.l. (https://www.smartshaped.com/)

Language: Java - Size: 1.59 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 2

KwokHing/YandexCatBoost-Python-Demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

Language: Jupyter Notebook - Size: 743 KB - Last synced at: 9 months ago - Pushed at: about 6 years ago - Stars: 30 - Forks: 16

Sajid030/anime-recommendation-system

Personalized anime recommendations based on collaborative filtering. Discover your next favorite anime!

Language: Jupyter Notebook - Size: 19.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 29 - Forks: 10

TsLu1s/atlantic

Atlantic: Automated Data Preprocessing Framework for Machine Learning

Language: Python - Size: 1.94 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 6

Pooja-Bhojwani/linked-eed

Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.

Language: Python - Size: 443 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 29 - Forks: 17

vlivashkin/GPUParallel

Joblib-like interface for parallel GPU computations (e.g. data preprocessing)

Language: Python - Size: 111 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 2

twardoch/split-markdown4gpt

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

Language: Python - Size: 78.1 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 27 - Forks: 2

caxelos/Thesis-Project

Gaze estimation algorithms in C++ using laptop's camera as input and machine learning algorithms for the prediction of gaze's direction.

Language: Python - Size: 74.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 26 - Forks: 6

nicomignoni/tab2img

A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.

Language: Python - Size: 497 KB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 26 - Forks: 5

YakshHaranwala/PTRAIL

PTRAIL is a state-of-the art parallel computation library for Mobility Data Preprocessing and feature extraction.

Language: Python - Size: 143 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 26 - Forks: 7

MigoXLab/awesome-data-quality

A comprehensive collection of data quality resources, tools, papers, and projects across various data types including traditional data, LLM pretraining/fine-tuning data, multimodal data, and more. Essential reference for researchers and practitioners in data-centric AI.

Size: 71.3 KB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 23 - Forks: 4

ipriyaaanshu/lung-cancer-detection

This is a project based on Data Science Bowl 2017. I did my best to propose a solution for the problem but I am still new to Deep Learning so my solution is not the optimal one but it can definitely be improved with some fine tuning and better resources.

Language: Jupyter Notebook - Size: 22.6 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 23 - Forks: 23

azaz9026/Medicine-Recommendation-System

A Medicine Recommendation System in machine learning (ML) is a software application designed to assist healthcare professionals and patients in selecting the most appropriate medication based on various factors such as medical history, symptoms, demographics, and drug interactions

Language: Jupyter Notebook - Size: 405 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 22 - Forks: 5

abrazinskas/machine-learning-data-pipeline

Pipeline module for parallel real-time data processing for machine learning models development and production purposes.

Size: 3.41 MB - Last synced at: 4 months ago - Pushed at: about 6 years ago - Stars: 22 - Forks: 2

lex-hue/Stock-Predictor-V4 📦

A reinforcement learning model specialized in stock prediction utilizing deep learning techniques, incorporating reward mechanisms, compatible with any machine equipped with Python.

Language: Python - Size: 148 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 8

buabaj/xplore

A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

Language: Python - Size: 1.74 MB - Last synced at: 26 days ago - Pushed at: over 4 years ago - Stars: 21 - Forks: 11

suraj-maniyar/Stock-Trading-Using-Machine-Learning

A comprehensive approach for stock trading implemented using Neural Network and Reinforcement Learning separately.

Language: Python - Size: 12.4 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 21 - Forks: 9

CogitatorTech/feature-factory

A feature engineering library for Rust 🦀 with Python bindings 🐍 (WIP)

Language: Rust - Size: 118 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 20 - Forks: 0

gyrdym/ml_preprocessing

Implementation of popular data preprocessing algorithms for Machine learning

Language: Dart - Size: 5.44 MB - Last synced at: 9 months ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 1

ammarshaikh123/Projects-on-Data-Cleaning-and-Manipulation

This repository contains projects I have worked on for Data Cleaning and Manipulation in Python.

Language: Jupyter Notebook - Size: 8.55 MB - Last synced at: almost 3 years ago - Pushed at: about 6 years ago - Stars: 20 - Forks: 16

hemangjoshi37a/PersonalGoalAssistant

AI-driven Personal Goal Assistant: Reinforcement learning-powered software mimics user behavior, interacts with computer inputs, and autonomously achieves goals in finance, social networking, and productivity. Open-source, Python-based RL agent.

Language: Python - Size: 20.5 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 19 - Forks: 1

klarEDA/klar-EDA

A python library for automated exploratory data analysis

Language: Python - Size: 262 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 18 - Forks: 23

AlexanderSouthan/pyPreprocessing

Especially useful for preprocessing of datasets like Raman spectra, infrared spectra, UV/Vis spectra, but also HPLC data and many other types of data. pyPreprocessing includes baseline correction, smoothing, filtering, normalization and transformation.

Language: Python - Size: 107 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 4

dataclr/dataclr

Feature selection for tabular datasets using advanced filter and wrapper methods

Language: Python - Size: 107 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 17 - Forks: 1

Vidhi1290/Deep-Learning-for-EEG-Emotion-Classification

This repository contains a Python code script for performing emotion classification using EEG (Electroencephalogram) data. Emotion classification from EEG signals is an important application in neuroscience and human-computer interaction. The code leverages deep learning techniques to analyze EEG data and predict emotional states.

Language: Jupyter Notebook - Size: 1.79 MB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 2

am1tyadav/teal

Library of TensorFlow layers for audio data processing and data augmentation

Language: Python - Size: 2.09 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 17 - Forks: 6

ISTE-VESIT-ORG/Machinera-2020

This is an AI Series where we will cover Machine Learning and Deep Learning topics from the very basics.

Size: 16.4 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 17 - Forks: 0

elbaulp/DPASF

My MSc on Data Science final project. This is a library for Data Pre-processing Algorithms for Streaming in Flink (DPASF)

Language: Scala - Size: 1.24 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 17 - Forks: 3

sourcecode369/ml-algorithms-on-scikit-and-keras

Implementation scripts of Machine Learning algorithms on Scikit-learn and Keras for complete novice..

Language: Jupyter Notebook - Size: 26.6 MB - Last synced at: almost 3 years ago - Pushed at: over 7 years ago - Stars: 17 - Forks: 12

KhaledAshrafH/ChatGPT-Sentiment-Analysis

This project aims to perform sentiment analysis on tweets related to ChatGPT, a popular language model developed by OpenAI. The dataset used for training and testing consists of 219,293 tweets collected over a month. Each tweet is classified as positive ("good"), negative ("bad"), or ("neutral").

Language: Jupyter Notebook - Size: 25.3 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 1

parvvaresh/Satellite_data

This repository provides Python code for converting satellite data into a format suitable for deep learning models. It supports various deep learning architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory networks (LSTMs).

Language: Python - Size: 15.4 MB - Last synced at: 8 months ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

Aayushpatel007/topicrankpy

A Python package to get useful information from documents using TopicRank Algorithm.

Language: Python - Size: 72.3 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 3

orbxball/timit-preprocessor

Extract mfcc vectors and phones from TIMIT dataset

Language: Shell - Size: 6.84 KB - Last synced at: 8 months ago - Pushed at: almost 3 years ago - Stars: 16 - Forks: 0

Western-OC2-Lab/MSANA-Online-Data-Stream-Analytics-And-Concept-Drift-Adaptation

Data stream analytics: Implement online learning methods to address concept drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.

Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: almost 3 years ago - Pushed at: almost 3 years ago - Stars: 16 - Forks: 4

ksbg/sparklanes

A lightweight data processing framework for Apache Spark

Language: Python - Size: 185 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 16 - Forks: 5

kaviles22/EEG_SignalsClassification

Preprocessing, analysis and classification of EEG signals into 4 classes.

Language: Jupyter Notebook - Size: 812 MB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 6

VainF/nyuv2-python-toolkit Fork of ankurhanda/nyuv2-meta-data

nyuv2 toolbox for data extraction and loading.

Language: Python - Size: 22.5 MB - Last synced at: almost 3 years ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 2

krypticmouse/10-Days-of-Statistics-and-Data-Preprocessing

List of all the resources I used during 10 days of Statistics and Data Preprocessing.

Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 15 - Forks: 4

Ashwin-op/Machine-Learning-Series

Datasets and Codes for the ML Series

Language: Python - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 15 - Forks: 2

SagarGaniga/Data-Preprocessing

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.

Language: Jupyter Notebook - Size: 422 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 21

mikeqfu/pyhelpers

PyHelpers: An open-source toolkit for facilitating Python users' data manipulation tasks

Language: Python - Size: 8.88 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 3

Gowtham1729/Android-App-Malware-Detector

A Deep Learning Model for detecting Malware Applications

Language: Python - Size: 165 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 14 - Forks: 0

hectorpadin1/Network-Intrusion-Detection-System

En este proyecto se evalúan y comparan diferentes técnicas de aprendizaje automático para la detección de intrusiones en red.

Language: Jupyter Notebook - Size: 4.77 MB - Last synced at: almost 3 years ago - Pushed at: about 3 years ago - Stars: 14 - Forks: 3

Hyprnx/used-cars-prices-prediction

This repo contains all the source code and obtained data for the used cars prices

Language: Jupyter Notebook - Size: 40.8 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 4

AiCorsair/Dataquest-Data-Science-Analysis-Projects

A repository dedicated to storing guided projects completed while learning data science concepts with Dataquest.

Language: Jupyter Notebook - Size: 74 MB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 13 - Forks: 3

sharmaroshan/Numpy-and-Pandas

Numpy and Pandas are one of the most important building blocks of knowledge to get started in the field of Data Science, Analytics, Machine Learning, Business Intelligence, and Business Analytics. This Tutorial Focuses to help the Beginners to learn the core Concepts of Numpy and Pandas and get started with Machine Learning and Data Science.

Language: HTML - Size: 3.02 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 6

rbhatia46/Data-Preprocessing-Template

This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.

Language: Python - Size: 1.95 KB - Last synced at: 7 months ago - Pushed at: over 7 years ago - Stars: 13 - Forks: 10

halil/sau-ml

SAU Makine Öğrenmesi Eğitim İçerikleri

Language: Python - Size: 14.8 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 13 - Forks: 3

agrawal-priyank/Web-Scraper-Sentiment-Analysis-TripAdvisor

Academic project for Advances in Data Science and Architecture course

Language: R - Size: 167 KB - Last synced at: 6 months ago - Pushed at: almost 8 years ago - Stars: 13 - Forks: 10

CleverInsight/cognito

🚀🤖 Cognito - Simplifies AutoML Data Preprocessing.

Language: Python - Size: 950 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 11

DolbyUUU/byte_pair_encoding_BPE_subword_tokenization_implementation_python

Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python

Language: Python - Size: 449 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 12 - Forks: 0

Amshra267/Cassandra-Udyam

Contains our Approach for the competition organized at Udyam'21

Language: Jupyter Notebook - Size: 3.83 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 2

raj1603chdry/Fake-News-Detection-System

Fake News Detection System for detecting whether news is fake or not. The model is trained using "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. Link for dataset: https://arxiv.org/abs/1705.00648.

Language: Jupyter Notebook - Size: 31.3 MB - Last synced at: 6 months ago - Pushed at: almost 6 years ago - Stars: 12 - Forks: 13

font-bakers/knead

A command line tool for preprocessing, manipulating and serializing font files for deep learning applications.

Language: Python - Size: 1.7 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 1

Bharat-Reddy/Bank-Marketing-Analysis

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit.

Language: Jupyter Notebook - Size: 2.34 MB - Last synced at: almost 3 years ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 9

alireza-heidarii/Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data

A real-time data cleaning pipeline for medical and healthcare data using Apache Spark, SparkNLP, Spark Streaming, and Kafka.

Language: Python - Size: 11.7 KB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 0

hxycorn/Twitter-Sentiment-Analysis-about-ChatGPT

A quantitative study on over 1.25 million tweets about ChatGPT, employed data scrapping, data cleaning, EDA, topic modeling, and sentiment analysis.

Language: Jupyter Notebook - Size: 6.06 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 6

kuleafenu/customizable-web-crawler

This web crawler can be customized to scrape almost all types of websites.

Language: Python - Size: 231 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 11 - Forks: 4

IMsumitkumar/No-code-ML-platform-DashB.ai

A no code machine learning pipelines and data visualization platform | perform with learning

Language: JavaScript - Size: 12.1 MB - Last synced at: almost 3 years ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 4

amir-hojjati/Data-Analysis-Online-Retail-Transactions

This repository is created to represent the processing and the analysis that has been done on this online retail dataset.

Language: Jupyter Notebook - Size: 37.7 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 10

giusalfieri/IPA_Project

Aircraft detection in satellite images using computer vision and machine learning.

Language: C++ - Size: 273 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

rifkyahmadsaputra/Prediction-Bitcoin-Price-with-Gated-Recurrent-Unit-RNN

In this project, I created prediction model for predict bitcoin price with Gated Recurrent Unit Model. GRU is a gating mechanism in recurrent neural networks (RNN) similar to a long short-term memory (LSTM), GRU have more simple computation and faster than LSTM because have fewer number of gates.

Language: Jupyter Notebook - Size: 1.69 MB - Last synced at: 5 months ago - Pushed at: about 5 years ago - Stars: 10 - Forks: 5

ojasphansekar/Zillow-Home-Value-Prediction

XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 10 - Forks: 7

mattborghi/EngineML

Study notebooks made for learning machine learning for the Hawk team

Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 10 - Forks: 5

DataPreprocessing/DataCleaning

Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.

Language: Python - Size: 117 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 4

bilalhameed248/Urdu-To-English-Machine-Translation

Fine tuned Urdu to English machine translation pre train model using Hugging-Face Trainer API on custom dataset.

Language: Jupyter Notebook - Size: 60.5 KB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 2

Related Topics
machine-learning 472 python 344 data-visualization 262 data-science 248 data-analysis 174 pandas 152 data-cleaning 134 feature-engineering 122 deep-learning 118 exploratory-data-analysis 117 scikit-learn 97 numpy 85 classification 85 jupyter-notebook 69 matplotlib 67 eda 65 logistic-regression 64 seaborn 63 machine-learning-algorithms 62 model-evaluation 58 python3 57 tensorflow 54 data 49 feature-selection 49 random-forest 48 data-preparation 47 natural-language-processing 46 predictive-modeling 46 linear-regression 46 nlp 41 data-analytics 37 hyperparameter-tuning 37 neural-networks 36 data-mining 36 artificial-intelligence 34 sklearn 34 regression 34 clustering 32 streamlit 29 pytorch 29 model-training 29 supervised-learning 28 neural-network 27 keras 26 feature-extraction 26 data-processing 25 data-wrangling 24 data-engineering 23 regression-models 23 sentiment-analysis 23 xgboost 23 computer-vision 22 random-forest-classifier 21 visualization 21 decision-trees 19 time-series 19 prediction 19 flask 19 ai 18 dataset 17 text-classification 17 sql 16 data-manipulation 16 image-classification 16 cross-validation 16 r 16 dimensionality-reduction 16 cnn 16 kaggle 15 image-processing 15 unsupervised-learning 15 data-transformation 15 statistics 15 convolutional-neural-networks 14 time-series-analysis 14 knn 14 data-exploration 14 model-training-and-evaluation 14 docker 14 svm 14 powerbi 14 data-collection 14 lstm 13 binary-classification 13 pipeline 13 decision-tree 13 outlier-detection 13 web-scraping 13 model-selection 13 big-data 13 gradient-boosting 13 tableau 12 imbalanced-data 12 spark 12 smote 12 naive-bayes-classifier 12 ml 12 k-means-clustering 12 decision-tree-classifier 12 llm 11