Topic: "data-mining"
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Language: Python - Size: 154 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 26,519 - Forks: 3,314

academic/awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
Size: 1.28 MB - Last synced at: 3 days ago - Pushed at: 12 days ago - Stars: 26,252 - Forks: 6,054

eriklindernoren/ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Language: Python - Size: 540 KB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 24,471 - Forks: 4,647

EthicalML/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Size: 2.36 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 18,412 - Forks: 2,342

microsoft/LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Language: C++ - Size: 23.2 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 17,175 - Forks: 3,875

piskvorky/gensim
Topic Modelling for Humans
Language: Python - Size: 101 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 15,995 - Forks: 4,395

rasbt/python-machine-learning-book
The "Python Machine Learning (1st edition)" book code repository and info resource
Language: Jupyter Notebook - Size: 152 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 12,407 - Forks: 4,417

tangyudi/Ai-Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Size: 1.27 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 10,993 - Forks: 2,447

yzhao062/pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
Language: Python - Size: 39.3 MB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 9,167 - Forks: 1,411

yzhao062/anomaly-detection-resources
Anomaly detection related books, papers, videos, and toolboxes
Language: Python - Size: 232 KB - Last synced at: 6 days ago - Pushed at: 18 days ago - Stars: 8,699 - Forks: 1,771

catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Language: C++ - Size: 1.66 GB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 8,378 - Forks: 1,217

sktime/sktime
A unified framework for machine learning with time series
Language: Python - Size: 76.9 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 8,368 - Forks: 1,531

jivoi/awesome-ml-for-cybersecurity
:octocat: Machine Learning for Cyber Security
Size: 225 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 7,541 - Forks: 1,793

MontFerret/ferret
Declarative web scraping
Language: Go - Size: 4.26 MB - Last synced at: about 20 hours ago - Pushed at: 3 days ago - Stars: 5,808 - Forks: 305

faridrashidi/kaggle-solutions
🏅 Collection of Kaggle Solutions and Ideas 🏅
Language: HTML - Size: 33.2 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 5,668 - Forks: 2,123

biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
Language: Python - Size: 98.2 MB - Last synced at: 5 days ago - Pushed at: 16 days ago - Stars: 5,163 - Forks: 1,053

rasbt/mlxtend
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Language: Python - Size: 94 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 4,996 - Forks: 878

r0f1/datascience
Curated list of Python resources for data science.
Size: 718 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,370 - Forks: 700

microsoft/RD-Agent
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.
Language: Python - Size: 49.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4,321 - Forks: 373

deanmalmgren/textract
extract text from any document. no muss. no fuss.
Language: HTML - Size: 4.31 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 4,120 - Forks: 626

alibaba/Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Language: Java - Size: 18 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 3,605 - Forks: 800

rob-med/awesome-TS-anomaly-detection
List of tools & datasets for anomaly detection on time-series data.
Size: 141 KB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 3,055 - Forks: 454

Kanaries/graphic-walker
An open source alternative to Tableau. Embeddable visual analytic
Language: TypeScript - Size: 3.22 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2,813 - Forks: 146

automeris-io/WebPlotDigitizer
Computer vision assisted tool to extract numerical data from plot images.
Language: JavaScript - Size: 47.4 MB - Last synced at: 28 days ago - Pushed at: 12 months ago - Stars: 2,759 - Forks: 380

tirthajyoti/Papers-Literature-ML-DL-RL-AI
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Size: 495 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 2,562 - Forks: 755

dblalock/bolt
10x faster matrix and vector operations
Language: C++ - Size: 338 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 2,479 - Forks: 173

WZBSocialScienceCenter/pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Language: Python - Size: 138 MB - Last synced at: 28 days ago - Pushed at: almost 3 years ago - Stars: 2,235 - Forks: 371

invoice-x/invoice2data
Extract structured data from PDF invoices
Language: Python - Size: 2.24 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 1,959 - Forks: 501

PaddlePaddle/Research
novel deep learning research works with PaddlePaddle
Language: Python - Size: 289 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1,734 - Forks: 785

youngfish42/Awesome-FL
Comprehensive and timely academic information on federated learning (papers, frameworks, datasets, tutorials, workshops)
Language: Python - Size: 5.11 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1,697 - Forks: 195

404notf0und/AI-for-Security-Learning
安全场景、基于AI的安全算法和安全数据分析业界实践
Size: 127 KB - Last synced at: 14 days ago - Pushed at: almost 4 years ago - Stars: 1,695 - Forks: 343

benedekrozemberczki/awesome-fraud-detection-papers
A curated list of data mining papers about fraud detection.
Language: Python - Size: 490 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 1,693 - Forks: 311

safe-graph/graph-fraud-detection-papers
A curated list of graph-based fraud, anomaly, and outlier detection papers & resources
Size: 240 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1,555 - Forks: 267

Yimeng-Zhang/feature-engineering-and-feature-selection
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Language: Jupyter Notebook - Size: 1.28 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 1,549 - Forks: 416

sepandhaghighi/pycm
Multi-class confusion matrix library in Python
Language: Python - Size: 11.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,475 - Forks: 126

eBay/tsv-utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Language: D - Size: 2.77 MB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 1,444 - Forks: 82

demidovakatya/vvedenie-mashinnoe-obuchenie
:memo: Подборка ресурсов по машинному обучению
Size: 2.21 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 1,430 - Forks: 327

zslucky/awesome-AI-books
Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning
Language: Jupyter Notebook - Size: 607 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 1,416 - Forks: 342

WenjieDu/PyPOTS
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
Language: Python - Size: 4.02 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,408 - Forks: 139

qingsongedu/awesome-AI-for-time-series-papers
A professional list of Papers, Tutorials, and Surveys on AI for Time Series in top AI conferences and journals.
Size: 923 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 1,391 - Forks: 135

CIRCL/AIL-framework
AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
Language: Python - Size: 96.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,327 - Forks: 284

PatMartin/Dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Language: JavaScript - Size: 167 MB - Last synced at: 7 days ago - Pushed at: about 6 years ago - Stars: 1,320 - Forks: 308

alan-turing-institute/CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
Language: Python - Size: 3.47 MB - Last synced at: 13 days ago - Pushed at: 24 days ago - Stars: 1,291 - Forks: 77

annoviko/pyclustering
pyclustering is a Python, C++ data mining library.
Language: Python - Size: 33.4 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 1,184 - Forks: 254

aeon-toolkit/aeon
A toolkit for machine learning from time series
Language: Python - Size: 104 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,164 - Forks: 209

lightaime/deep_gcns_torch
Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org
Language: Python - Size: 7.02 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1,158 - Forks: 155

nfstream/nfstream
NFStream: a Flexible Network Data Analysis Framework.
Language: Python - Size: 115 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 1,142 - Forks: 134

sunlabuiuc/PyHealth
A Deep Learning Python Toolkit for Healthcare Applications.
Language: Python - Size: 120 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,109 - Forks: 295

K0lb3/UnityPy
UnityPy is python module that makes it possible to extract/unpack and edit Unity assets
Language: Python - Size: 30.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,006 - Forks: 142

ipython-books/cookbook-2nd
IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Language: Python - Size: 45.7 MB - Last synced at: 29 days ago - Pushed at: about 3 years ago - Stars: 959 - Forks: 255

TheAlgorithms/R
Collection of various algorithms implemented in R.
Language: R - Size: 1.02 MB - Last synced at: 1 day ago - Pushed at: 22 days ago - Stars: 945 - Forks: 314

Minqi824/ADBench
Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.
Language: Python - Size: 2.01 GB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 930 - Forks: 139

yueliu1999/Awesome-Deep-Graph-Clustering
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods (papers, codes, and datasets).
Language: Python - Size: 999 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 903 - Forks: 142

GoogleCloudPlatform/DataflowJavaSDK 📦
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Size: 12.9 MB - Last synced at: 10 days ago - Pushed at: over 4 years ago - Stars: 857 - Forks: 320

safe-graph/graph-adversarial-learning-literature
A curated list of adversarial attacks and defenses papers on graph-structured data.
Size: 544 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 851 - Forks: 132

leomaurodesenv/game-datasets
:video_game: A curated list of awesome game datasets, and tools to artificial intelligence in games
Size: 8.52 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 842 - Forks: 55

elki-project/elki
ELKI Data Mining Toolkit
Language: Java - Size: 54.9 MB - Last synced at: 30 days ago - Pushed at: about 2 months ago - Stars: 808 - Forks: 325

process-intelligence-solutions/pm4py
Official public repository for PM4Py (Process Mining for Python) — an open-source library for exploring, analyzing, and optimizing business processes with Python.
Language: Python - Size: 114 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 804 - Forks: 308

jerlendds/osintbuddy
Node graphs, OSINT data mining, and plugins. Connect unstructured and public data for transformative insights. The rewrite can be found @ osintbuddy/osintbuddy
Language: TypeScript - Size: 28.5 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 787 - Forks: 72

ipython-books/cookbook-2nd-code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Language: Jupyter Notebook - Size: 44.3 MB - Last synced at: 29 days ago - Pushed at: over 3 years ago - Stars: 737 - Forks: 434

ail-project/ail-framework
AIL framework - Analysis Information Leak framework
Language: Python - Size: 93.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 726 - Forks: 101

arbox/data-science-with-ruby
Practical Data Science with Ruby based tools.
Language: Ruby - Size: 212 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 712 - Forks: 50

ashishpatel26/Amazing-Feature-Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Language: Jupyter Notebook - Size: 1.26 MB - Last synced at: 29 days ago - Pushed at: 11 months ago - Stars: 702 - Forks: 262

dataproofer/Dataproofer
A proofreader for your data
Language: JavaScript - Size: 23.5 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 693 - Forks: 53

jphall663/interpretable_machine_learning_with_python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
Language: Jupyter Notebook - Size: 34.7 MB - Last synced at: 29 days ago - Pushed at: 11 months ago - Stars: 676 - Forks: 207

yzhao062/combo
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination
Language: Python - Size: 4.95 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 650 - Forks: 106

business-science/timetk
Time series analysis in the `tidyverse`
Language: R - Size: 112 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 628 - Forks: 104

McGill-DMaS/Kam1n0-Community
The Kam1n0 Assembly Analysis Platform
Language: C - Size: 463 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 626 - Forks: 128

chris-greening/instascrape 📦
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Language: Python - Size: 18 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 616 - Forks: 110

stepthom/text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
Size: 707 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 577 - Forks: 199

holgerbrandl/krangl 📦
krangl is a {K}otlin DSL for data w{rangl}ing
Language: Kotlin - Size: 21.4 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 559 - Forks: 50

bonzanini/Book-SocialMediaMiningPython
Companion code for the book "Mastering Social Media Mining with Python"
Language: Python - Size: 4.88 MB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 550 - Forks: 266

chaoss/grimoirelab
GrimoireLab: platform for software development analytics and insights
Language: Roff - Size: 264 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 531 - Forks: 206

FanzhenLiu/Awesome-Deep-Community-Detection
Deep and conventional community detection related papers, implementations, datasets, and tools.
Size: 2.04 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 527 - Forks: 95

programminghistorian/jekyll
Jekyll-based static site for The Programming Historian
Language: HTML - Size: 896 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 526 - Forks: 227

hackingmaterials/matminer
Data mining for materials science
Language: HTML - Size: 41.7 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 517 - Forks: 199

jchao01/TradingView-data-scraper
Extract price and indicator data from TradingView charts to create ML datasets
Language: Python - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 513 - Forks: 114

h2oai/mli-resources
H2O.ai Machine Learning Interpretability Resources
Language: Jupyter Notebook - Size: 65.8 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 477 - Forks: 134

CogComp/cogcomp-nlp
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
Language: Java - Size: 85.5 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 475 - Forks: 144

serengil/chefboost
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python
Language: Python - Size: 1.09 MB - Last synced at: 30 days ago - Pushed at: about 1 month ago - Stars: 472 - Forks: 101

lzz19980125/awesome-time-series-segmentation-papers
This repository contains a reading list of papers on Time Series Segmentation. This repository is still being continuously improved.
Language: MATLAB - Size: 836 KB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 460 - Forks: 10

JiashuWu/Books
My book list
Size: 4.36 GB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 434 - Forks: 301

kk7nc/RMDL
RMDL: Random Multimodel Deep Learning for Classification
Language: Python - Size: 223 MB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 430 - Forks: 122

KnowageLabs/Knowage-Server
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Language: Java - Size: 347 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 421 - Forks: 229

chuanconggao/PrefixSpan-py
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
Language: Python - Size: 66.4 KB - Last synced at: 22 days ago - Pushed at: almost 5 years ago - Stars: 418 - Forks: 92

khuyentran1401/awesome-Python-data-science-books
Probably the best curated list of data science books in Python
Size: 209 KB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 408 - Forks: 127

ScriptSmith/instamancer
Scrape Instagram's API with Puppeteer
Language: TypeScript - Size: 5.4 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 402 - Forks: 61

airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Language: Python - Size: 116 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 402 - Forks: 57

Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Language: C++ - Size: 143 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 401 - Forks: 76

Fraud-Detection-Handbook/fraud-detection-handbook
Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook
Language: Jupyter Notebook - Size: 21.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 395 - Forks: 148

yzhao062/SUOD
(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)
Language: Python - Size: 10.9 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 386 - Forks: 49

matrix-profile-foundation/matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Language: Python - Size: 6.69 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 382 - Forks: 63

liyangbit/PyDataLab
open source for wechat-official-account (ID: PyDataLab)
Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 382 - Forks: 237

ScriptSmith/reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Language: Python - Size: 7.33 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 382 - Forks: 67

lefterisloukas/edgar-crawler
The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.
Language: Python - Size: 63 MB - Last synced at: 28 days ago - Pushed at: about 2 months ago - Stars: 365 - Forks: 100

IBM/AutoMLPipeline.jl
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Language: Julia - Size: 21.7 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 363 - Forks: 28

JoaquinAmatRodrigo/Estadistica-con-R
Apuntes personales sobre estadística, machine learning y lenguaje de programación R
Language: R - Size: 274 MB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 362 - Forks: 289

kinverarity1/lasio
Python library for reading and writing well data using Log ASCII Standard (LAS) files
Language: Lasso - Size: 5.02 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 359 - Forks: 156

ZhiningLiu1998/imbalanced-ensemble
🛠️ Class-imbalanced Ensemble Learning Toolbox. | 类别不平衡/长尾机器学习库
Language: Python - Size: 16.8 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 358 - Forks: 52

sergioburdisso/pyss3
A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)
Language: Python - Size: 102 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 341 - Forks: 44
