An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-mining"

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Language: Python - Size: 154 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 26,519 - Forks: 3,314

academic/awesome-datascience

:memo: An awesome Data Science repository to learn and apply for real world problems.

Size: 1.28 MB - Last synced at: 3 days ago - Pushed at: 12 days ago - Stars: 26,252 - Forks: 6,054

eriklindernoren/ML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Language: Python - Size: 540 KB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 24,471 - Forks: 4,647

EthicalML/awesome-production-machine-learning

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

Size: 2.36 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 18,412 - Forks: 2,342

microsoft/LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Language: C++ - Size: 23.2 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 17,175 - Forks: 3,875

piskvorky/gensim

Topic Modelling for Humans

Language: Python - Size: 101 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 15,995 - Forks: 4,395

rasbt/python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource

Language: Jupyter Notebook - Size: 152 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 12,407 - Forks: 4,417

tangyudi/Ai-Learn

人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域

Size: 1.27 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 10,993 - Forks: 2,447

yzhao062/pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques

Language: Python - Size: 39.3 MB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 9,167 - Forks: 1,411

yzhao062/anomaly-detection-resources

Anomaly detection related books, papers, videos, and toolboxes

Language: Python - Size: 232 KB - Last synced at: 6 days ago - Pushed at: 18 days ago - Stars: 8,699 - Forks: 1,771

catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Language: C++ - Size: 1.66 GB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 8,378 - Forks: 1,217

sktime/sktime

A unified framework for machine learning with time series

Language: Python - Size: 76.9 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 8,368 - Forks: 1,531

jivoi/awesome-ml-for-cybersecurity

:octocat: Machine Learning for Cyber Security

Size: 225 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 7,541 - Forks: 1,793

MontFerret/ferret

Declarative web scraping

Language: Go - Size: 4.26 MB - Last synced at: about 20 hours ago - Pushed at: 3 days ago - Stars: 5,808 - Forks: 305

faridrashidi/kaggle-solutions

🏅 Collection of Kaggle Solutions and Ideas 🏅

Language: HTML - Size: 33.2 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 5,668 - Forks: 2,123

biolab/orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Language: Python - Size: 98.2 MB - Last synced at: 5 days ago - Pushed at: 16 days ago - Stars: 5,163 - Forks: 1,053

rasbt/mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Language: Python - Size: 94 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 4,996 - Forks: 878

r0f1/datascience

Curated list of Python resources for data science.

Size: 718 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,370 - Forks: 700

microsoft/RD-Agent

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.

Language: Python - Size: 49.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4,321 - Forks: 373

deanmalmgren/textract

extract text from any document. no muss. no fuss.

Language: HTML - Size: 4.31 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 4,120 - Forks: 626

alibaba/Alink

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

Language: Java - Size: 18 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 3,605 - Forks: 800

rob-med/awesome-TS-anomaly-detection

List of tools & datasets for anomaly detection on time-series data.

Size: 141 KB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 3,055 - Forks: 454

Kanaries/graphic-walker

An open source alternative to Tableau. Embeddable visual analytic

Language: TypeScript - Size: 3.22 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2,813 - Forks: 146

automeris-io/WebPlotDigitizer

Computer vision assisted tool to extract numerical data from plot images.

Language: JavaScript - Size: 47.4 MB - Last synced at: 28 days ago - Pushed at: 12 months ago - Stars: 2,759 - Forks: 380

tirthajyoti/Papers-Literature-ML-DL-RL-AI

Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning

Size: 495 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 2,562 - Forks: 755

dblalock/bolt

10x faster matrix and vector operations

Language: C++ - Size: 338 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 2,479 - Forks: 173

WZBSocialScienceCenter/pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Language: Python - Size: 138 MB - Last synced at: 28 days ago - Pushed at: almost 3 years ago - Stars: 2,235 - Forks: 371

invoice-x/invoice2data

Extract structured data from PDF invoices

Language: Python - Size: 2.24 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 1,959 - Forks: 501

PaddlePaddle/Research

novel deep learning research works with PaddlePaddle

Language: Python - Size: 289 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1,734 - Forks: 785

youngfish42/Awesome-FL

Comprehensive and timely academic information on federated learning (papers, frameworks, datasets, tutorials, workshops)

Language: Python - Size: 5.11 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1,697 - Forks: 195

404notf0und/AI-for-Security-Learning

安全场景、基于AI的安全算法和安全数据分析业界实践

Size: 127 KB - Last synced at: 14 days ago - Pushed at: almost 4 years ago - Stars: 1,695 - Forks: 343

benedekrozemberczki/awesome-fraud-detection-papers

A curated list of data mining papers about fraud detection.

Language: Python - Size: 490 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 1,693 - Forks: 311

safe-graph/graph-fraud-detection-papers

A curated list of graph-based fraud, anomaly, and outlier detection papers & resources

Size: 240 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1,555 - Forks: 267

Yimeng-Zhang/feature-engineering-and-feature-selection

A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

Language: Jupyter Notebook - Size: 1.28 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 1,549 - Forks: 416

sepandhaghighi/pycm

Multi-class confusion matrix library in Python

Language: Python - Size: 11.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,475 - Forks: 126

eBay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

Language: D - Size: 2.77 MB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 1,444 - Forks: 82

demidovakatya/vvedenie-mashinnoe-obuchenie

:memo: Подборка ресурсов по машинному обучению

Size: 2.21 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 1,430 - Forks: 327

zslucky/awesome-AI-books

Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning

Language: Jupyter Notebook - Size: 607 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 1,416 - Forks: 342

WenjieDu/PyPOTS

A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

Language: Python - Size: 4.02 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,408 - Forks: 139

qingsongedu/awesome-AI-for-time-series-papers

A professional list of Papers, Tutorials, and Surveys on AI for Time Series in top AI conferences and journals.

Size: 923 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 1,391 - Forks: 135

CIRCL/AIL-framework

AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project

Language: Python - Size: 96.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,327 - Forks: 284

PatMartin/Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

Language: JavaScript - Size: 167 MB - Last synced at: 7 days ago - Pushed at: about 6 years ago - Stars: 1,320 - Forks: 308

alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

Language: Python - Size: 3.47 MB - Last synced at: 13 days ago - Pushed at: 24 days ago - Stars: 1,291 - Forks: 77

annoviko/pyclustering

pyclustering is a Python, C++ data mining library.

Language: Python - Size: 33.4 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 1,184 - Forks: 254

aeon-toolkit/aeon

A toolkit for machine learning from time series

Language: Python - Size: 104 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,164 - Forks: 209

lightaime/deep_gcns_torch

Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

Language: Python - Size: 7.02 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1,158 - Forks: 155

nfstream/nfstream

NFStream: a Flexible Network Data Analysis Framework.

Language: Python - Size: 115 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 1,142 - Forks: 134

sunlabuiuc/PyHealth

A Deep Learning Python Toolkit for Healthcare Applications.

Language: Python - Size: 120 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,109 - Forks: 295

K0lb3/UnityPy

UnityPy is python module that makes it possible to extract/unpack and edit Unity assets

Language: Python - Size: 30.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,006 - Forks: 142

ipython-books/cookbook-2nd

IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018

Language: Python - Size: 45.7 MB - Last synced at: 29 days ago - Pushed at: about 3 years ago - Stars: 959 - Forks: 255

TheAlgorithms/R

Collection of various algorithms implemented in R.

Language: R - Size: 1.02 MB - Last synced at: 1 day ago - Pushed at: 22 days ago - Stars: 945 - Forks: 314

Minqi824/ADBench

Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.

Language: Python - Size: 2.01 GB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 930 - Forks: 139

yueliu1999/Awesome-Deep-Graph-Clustering

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods (papers, codes, and datasets).

Language: Python - Size: 999 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 903 - Forks: 142

GoogleCloudPlatform/DataflowJavaSDK 📦

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Size: 12.9 MB - Last synced at: 10 days ago - Pushed at: over 4 years ago - Stars: 857 - Forks: 320

safe-graph/graph-adversarial-learning-literature

A curated list of adversarial attacks and defenses papers on graph-structured data.

Size: 544 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 851 - Forks: 132

leomaurodesenv/game-datasets

:video_game: A curated list of awesome game datasets, and tools to artificial intelligence in games

Size: 8.52 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 842 - Forks: 55

elki-project/elki

ELKI Data Mining Toolkit

Language: Java - Size: 54.9 MB - Last synced at: 30 days ago - Pushed at: about 2 months ago - Stars: 808 - Forks: 325

process-intelligence-solutions/pm4py

Official public repository for PM4Py (Process Mining for Python) — an open-source library for exploring, analyzing, and optimizing business processes with Python.

Language: Python - Size: 114 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 804 - Forks: 308

jerlendds/osintbuddy

Node graphs, OSINT data mining, and plugins. Connect unstructured and public data for transformative insights. The rewrite can be found @ osintbuddy/osintbuddy

Language: TypeScript - Size: 28.5 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 787 - Forks: 72

ipython-books/cookbook-2nd-code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

Language: Jupyter Notebook - Size: 44.3 MB - Last synced at: 29 days ago - Pushed at: over 3 years ago - Stars: 737 - Forks: 434

ail-project/ail-framework

AIL framework - Analysis Information Leak framework

Language: Python - Size: 93.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 726 - Forks: 101

arbox/data-science-with-ruby

Practical Data Science with Ruby based tools.

Language: Ruby - Size: 212 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 712 - Forks: 50

ashishpatel26/Amazing-Feature-Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Language: Jupyter Notebook - Size: 1.26 MB - Last synced at: 29 days ago - Pushed at: 11 months ago - Stars: 702 - Forks: 262

dataproofer/Dataproofer

A proofreader for your data

Language: JavaScript - Size: 23.5 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 693 - Forks: 53

jphall663/interpretable_machine_learning_with_python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

Language: Jupyter Notebook - Size: 34.7 MB - Last synced at: 29 days ago - Pushed at: 11 months ago - Stars: 676 - Forks: 207

yzhao062/combo

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

Language: Python - Size: 4.95 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 650 - Forks: 106

business-science/timetk

Time series analysis in the `tidyverse`

Language: R - Size: 112 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 628 - Forks: 104

McGill-DMaS/Kam1n0-Community

The Kam1n0 Assembly Analysis Platform

Language: C - Size: 463 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 626 - Forks: 128

chris-greening/instascrape 📦

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

Language: Python - Size: 18 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 616 - Forks: 110

stepthom/text_mining_resources

Resources for learning about Text Mining and Natural Language Processing

Size: 707 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 577 - Forks: 199

holgerbrandl/krangl 📦

krangl is a {K}otlin DSL for data w{rangl}ing

Language: Kotlin - Size: 21.4 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 559 - Forks: 50

bonzanini/Book-SocialMediaMiningPython

Companion code for the book "Mastering Social Media Mining with Python"

Language: Python - Size: 4.88 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 550 - Forks: 266

chaoss/grimoirelab

GrimoireLab: platform for software development analytics and insights

Language: Roff - Size: 264 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 531 - Forks: 206

FanzhenLiu/Awesome-Deep-Community-Detection

Deep and conventional community detection related papers, implementations, datasets, and tools.

Size: 2.04 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 527 - Forks: 95

programminghistorian/jekyll

Jekyll-based static site for The Programming Historian

Language: HTML - Size: 896 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 526 - Forks: 227

hackingmaterials/matminer

Data mining for materials science

Language: HTML - Size: 41.7 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 517 - Forks: 199

jchao01/TradingView-data-scraper

Extract price and indicator data from TradingView charts to create ML datasets

Language: Python - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 513 - Forks: 114

h2oai/mli-resources

H2O.ai Machine Learning Interpretability Resources

Language: Jupyter Notebook - Size: 65.8 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 477 - Forks: 134

CogComp/cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.

Language: Java - Size: 85.5 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 475 - Forks: 144

serengil/chefboost

A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python

Language: Python - Size: 1.09 MB - Last synced at: 30 days ago - Pushed at: about 1 month ago - Stars: 472 - Forks: 101

lzz19980125/awesome-time-series-segmentation-papers

This repository contains a reading list of papers on Time Series Segmentation. This repository is still being continuously improved.

Language: MATLAB - Size: 836 KB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 460 - Forks: 10

JiashuWu/Books

My book list

Size: 4.36 GB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 434 - Forks: 301

kk7nc/RMDL

RMDL: Random Multimodel Deep Learning for Classification

Language: Python - Size: 223 MB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 430 - Forks: 122

KnowageLabs/Knowage-Server

Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.

Language: Java - Size: 347 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 421 - Forks: 229

chuanconggao/PrefixSpan-py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.

Language: Python - Size: 66.4 KB - Last synced at: 22 days ago - Pushed at: almost 5 years ago - Stars: 418 - Forks: 92

khuyentran1401/awesome-Python-data-science-books

Probably the best curated list of data science books in Python

Size: 209 KB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 408 - Forks: 127

ScriptSmith/instamancer

Scrape Instagram's API with Puppeteer

Language: TypeScript - Size: 5.4 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 402 - Forks: 61

airbnb/artificial-adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Language: Python - Size: 116 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 402 - Forks: 57

Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Language: C++ - Size: 143 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 401 - Forks: 76

Fraud-Detection-Handbook/fraud-detection-handbook

Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook

Language: Jupyter Notebook - Size: 21.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 395 - Forks: 148

yzhao062/SUOD

(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)

Language: Python - Size: 10.9 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 386 - Forks: 49

matrix-profile-foundation/matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

Language: Python - Size: 6.69 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 382 - Forks: 63

liyangbit/PyDataLab

open source for wechat-official-account (ID: PyDataLab)

Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 382 - Forks: 237

ScriptSmith/reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

Language: Python - Size: 7.33 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 382 - Forks: 67

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

Language: Python - Size: 63 MB - Last synced at: 28 days ago - Pushed at: about 2 months ago - Stars: 365 - Forks: 100

IBM/AutoMLPipeline.jl

A package that makes it trivial to create and evaluate machine learning pipeline architectures.

Language: Julia - Size: 21.7 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 363 - Forks: 28

JoaquinAmatRodrigo/Estadistica-con-R

Apuntes personales sobre estadística, machine learning y lenguaje de programación R

Language: R - Size: 274 MB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 362 - Forks: 289

kinverarity1/lasio

Python library for reading and writing well data using Log ASCII Standard (LAS) files

Language: Lasso - Size: 5.02 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 359 - Forks: 156

ZhiningLiu1998/imbalanced-ensemble

🛠️ Class-imbalanced Ensemble Learning Toolbox. | 类别不平衡/长尾机器学习库

Language: Python - Size: 16.8 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 358 - Forks: 52

sergioburdisso/pyss3

A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)

Language: Python - Size: 102 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 341 - Forks: 44

Related Topics
machine-learning 1,292 data-science 1,282 python 1,234 data-analysis 888 data-visualization 718 data 278 python3 275 r 271 pandas 243 classification 226 deep-learning 213 clustering 204 jupyter-notebook 199 machine-learning-algorithms 157 nlp 143 natural-language-processing 130 dataset 129 artificial-intelligence 128 web-scraping 126 sql 121 numpy 118 database 114 scikit-learn 111 apriori-algorithm 111 text-mining 107 data-cleaning 105 data-mining-algorithms 103 statistics 103 datascience 102 java 99 random-forest 99 scraper 96 big-data 96 visualization 91 sentiment-analysis 90 matplotlib 86 exploratory-data-analysis 86 neural-network 84 data-engineering 83 scraping 82 data-analytics 79 association-rules 78 data-structures 76 time-series 75 sklearn 74 webscraping 73 twitter 73 ai 71 decision-trees 71 unsupervised-learning 68 feature-engineering 61 logistic-regression 59 javascript 59 anomaly-detection 57 regression 56 crawler 54 kaggle 54 algorithms 53 weka 52 information-retrieval 52 business-intelligence 51 feature-selection 50 analytics 50 api 49 seaborn 49 tensorflow 49 spark 48 selenium 47 linear-regression 47 data-preprocessing 46 apriori 46 text-classification 45 data-processing 45 data-extraction 45 recommender-system 45 decision-tree 45 knn 44 supervised-learning 44 tableau 44 clustering-algorithm 44 analysis 43 prediction 42 rstudio 42 powerbi 42 automation 41 association-rule-mining 41 time-series-analysis 41 twitter-api 41 beautifulsoup 40 excel 39 datamining 39 eda 39 naive-bayes-classifier 39 predictive-modeling 38 scrapy 38 preprocessing 38 nlp-machine-learning 38 data-mining-python 37 nodejs 37 social-network-analysis 36