GitHub topics: dask
1337kuzey/Real-Time-Fraud-Detection
This repository contains a real-time fraud detection system that leverages Kafka for data streaming and FastAPI for machine learning inference. It provides robust monitoring through Prometheus and Grafana, ensuring you can track performance and detect anomalies effectively. 🐙✨
Language: Python - Size: 1.35 MB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

dmitryglhf/autodask
AutoML Library Based on Dask with Bee Colony Optimization
Language: Python - Size: 1.66 MB - Last synced at: about 6 hours ago - Pushed at: about 7 hours ago - Stars: 3 - Forks: 0

Calmon43/task-scheduler
Task-scheduler is a lightweight tool that helps users automate and manage recurring tasks efficiently. It allows you to set schedules, monitor task execution, and receive notifications for task completion.
Size: 3.91 KB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 0 - Forks: 0

PylarBear/pybear
pybear is a Python computing library that augments data analytics functionality found in the popular numpy, scikit-learn, dask, and dask_ml libraries.
Language: Python - Size: 49 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 0 - Forks: 0

fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Language: Python - Size: 5.98 MB - Last synced at: about 1 hour ago - Pushed at: 3 months ago - Stars: 2,088 - Forks: 94

capitalone/datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
Language: Python - Size: 11.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 576 - Forks: 143

casangi/graphviper
Dask Based MapReduce for Multi Xarray Datasets.
Language: Python - Size: 2.61 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 2

shushantrishav/Microsoft-Malware-Prediction
A data science project to predict the probability of a machine encountering malware based on telemetry data collected from Microsoft Defender. Built using Python, Dask, LightGBM, and essential data science libraries for handling large-scale structured data.
Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 1

pydata/xarray
N-D labeled arrays and datasets in Python
Language: Python - Size: 47.7 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 3,870 - Forks: 1,150

rapidsai/cudf
cuDF - GPU DataFrame Library
Language: C++ - Size: 158 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8,992 - Forks: 951

dask/dask
Parallel computing with task scheduling
Language: Python - Size: 120 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 13,273 - Forks: 1,774

thewtex/ngff-zarr
A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.
Language: Python - Size: 774 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 42 - Forks: 9

nebari-dev/nebari
🪴 Nebari - your open source data science platform
Language: Python - Size: 16.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 296 - Forks: 102

bioio-devs/bioio-lif
A BioIO reader plugin for reading LIF (Leica Image File) images.
Language: Python - Size: 298 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 1

bioio-devs/bioio-ome-tiff
A BioIO reader plugin for reading Tiff files in the OME format.
Language: Python - Size: 132 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 2

bioio-devs/bioio-ome-tiled-tiff
A BioIO reader plugin for reading tiled tiff files in the OME format.
Language: Python - Size: 173 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

bioio-devs/bioio-tiff-glob
A BioIO reader plugin for reading Tiff Glob images.
Language: Python - Size: 53.7 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

bioio-devs/bioio
Image reading, metadata management, and image writing for Microscopy images in Python
Language: Python - Size: 8.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 88 - Forks: 7

bioio-devs/bioio-czi
A BioIO reader plugin for reading CZI files.
Language: Python - Size: 2.11 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 1

narwhals-dev/narwhals
Lightweight and extensible compatibility layer between dataframe libraries!
Language: Python - Size: 10.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,125 - Forks: 149

Ouranosinc/xclim
Library of derived climate variables, ie climate indicators, based on xarray.
Language: Python - Size: 60.5 MB - Last synced at: about 9 hours ago - Pushed at: 1 day ago - Stars: 360 - Forks: 67

dask/dask-jobqueue
Deploy Dask on job schedulers like PBS, SLURM, and SGE
Language: Python - Size: 755 KB - Last synced at: about 17 hours ago - Pushed at: 18 days ago - Stars: 249 - Forks: 144

nci/scores
scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
Language: Jupyter Notebook - Size: 18.5 MB - Last synced at: about 9 hours ago - Pushed at: about 19 hours ago - Stars: 169 - Forks: 32

basnijholt/adaptive-scheduler
Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:
Language: Python - Size: 1010 KB - Last synced at: about 23 hours ago - Pushed at: 3 days ago - Stars: 30 - Forks: 11

myryfe/dataframely
A declarative, 🐻❄️-native data frame validation library.
Language: Python - Size: 290 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

miniufo/xgrads
Parse and read ctl and associated binary file commonly used by GrADS into xarray
Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 76 - Forks: 27

nebari-dev/nebari-docs
📖 Documentation for Nebari
Size: 51.1 MB - Last synced at: about 11 hours ago - Pushed at: 6 days ago - Stars: 16 - Forks: 35

interfaces-programacao-proj/API-BIdaSESA
Aplicação Flask BI da SESA
Language: Python - Size: 20.5 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 1

stumpy-dev/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
Language: Python - Size: 127 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3,935 - Forks: 333

pytroll/satpy
Python package for earth-observing satellite data processing
Language: Python - Size: 24.6 MB - Last synced at: about 9 hours ago - Pushed at: 7 days ago - Stars: 1,119 - Forks: 311

bioio-devs/bioio-base
Typing, base classes, and more for BioIO projects.
Language: Python - Size: 2.88 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Language: Python - Size: 118 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 517 - Forks: 44

Matrix030/SteamLens
High-performance sentiment analysis platform for Steam reviews. Built with Python, Dask & transformers to process millions of reviews in minutes. Features AI topic assignment, sentiment separation by themes, GPU acceleration, and Streamlit web interface for game developers and data scientists.
Language: Jupyter Notebook - Size: 223 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

bioio-devs/bioio-nd2
A BioIO reader plugin for reading ND2 images.
Language: Python - Size: 59.6 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 1

dask/distributed
A distributed task scheduler for Dask
Language: Python - Size: 342 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,631 - Forks: 734

Nixtla/mlforecast
Scalable machine 🤖 learning for time series forecasting.
Language: Python - Size: 29.8 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 1,032 - Forks: 99

TimeEval/TimeEval
Evaluation Tool for Anomaly Detection Algorithms on Time Series
Language: Jupyter Notebook - Size: 24.9 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 126 - Forks: 18

data-apis/array-api-compat
Compatibility layer for common array libraries to support the Array API
Language: Python - Size: 1.8 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 97 - Forks: 35

akbaritabar/course-session-on-Other-Computational-Social-Science-Skills
Materials for course on ""Other" Computational Social Science Skills" at the fourth MPIDR Summer Incubator Program, June 10, 2025, Rostock, Germany (and online)
Size: 647 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

LDO-CERT/orochi
The Volatility Collaborative GUI
Language: JavaScript - Size: 73 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 246 - Forks: 21

ORNL/flowcept
Runtime data integration system that empowers any data processing system to capture and query workflow provenance using data observability and code instrumentation.
Language: Python - Size: 52.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5 - Forks: 5

casangi/astrohack
Antenna panel and position corrections.
Language: Jupyter Notebook - Size: 131 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 9 - Forks: 3

fschuch/xcompact3d_toolbox
A set of tools for pre and postprocessing prepared for the high-order Navier-Stokes solver XCompact3d
Language: Python - Size: 44.3 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 11 - Forks: 6

bioio-devs/bioio-imageio
A BioIO reader plugin for reading simple image and movie formats
Language: Python - Size: 93.8 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

msoechting/lexcube
Lexcube: 3D Data Cube Visualization in Jupyter Notebooks
Language: TypeScript - Size: 6.93 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 152 - Forks: 9

hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language: Python - Size: 110 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 1,512 - Forks: 232

dask-contrib/dask-deltatable
A Delta Lake reader for Dask
Language: Python - Size: 260 KB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 52 - Forks: 17

dask/dask-cloudprovider
Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
Language: Python - Size: 826 KB - Last synced at: 14 days ago - Pushed at: 16 days ago - Stars: 142 - Forks: 111

wigging/pythonic
Examples of the Python programming language
Language: Python - Size: 3.94 MB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 9 - Forks: 4

crate/sqlalchemy-cratedb
SQLAlchemy dialect for CrateDB.
Language: Python - Size: 1.54 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 6 - Forks: 2

p2p-ld/numpydantic
Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)
Language: Python - Size: 913 KB - Last synced at: 15 days ago - Pushed at: about 1 month ago - Stars: 108 - Forks: 1

brent-stone/hybrid_cloud_AI_SaaS
Full-stack "simple and free" on-prem AI/ML Software as a Service (SaaS) deployment reference. The backend uses pure-python data & AI ecosystem tooling: FastAPI, Prefect, Dask, SQLAlchemy, PostgreSQL, MinIO, and Redis. React Router 7 and TailwindCSS frontend. Traefik proxy, Google OpenID Connect, Docker Compose, and Cloudflare deployment.
Language: TypeScript - Size: 144 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

xarray-contrib/xeofs
Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
Language: Python - Size: 44.8 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 119 - Forks: 23

interTwin-eu/dask-flood-mapper
Map floods with Sentinel-1 radar images. We replicate in this package the work of Bauer-Marschallinger et al. (2022) on the TU Wien Bayesian-based flood mapping algorithm. This implementation is entirely based on Dask and data access via STAC with odc-stac.
Language: Python - Size: 19.3 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 5 - Forks: 3

pangeo-data/climpred
:earth_americas: Verification of weather and climate forecasts :earth_africa:
Language: Python - Size: 58.1 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 246 - Forks: 48

AllenCellModeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
Language: Python - Size: 173 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 215 - Forks: 50

climate-service-center/index_calculator
Calculate climate indicators based on xclim
Language: Jupyter Notebook - Size: 113 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 7 - Forks: 6

umr-lops/xsarsea
scientific functions to compute radar or geophysical parameters from satellite images over ocean
Language: Python - Size: 4.15 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 11 - Forks: 8

geoxarray/geoxarray
Geolocation utilities for xarray
Language: Python - Size: 384 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 104 - Forks: 8

shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Language: Jupyter Notebook - Size: 631 MB - Last synced at: 8 days ago - Pushed at: 24 days ago - Stars: 47 - Forks: 21

xarray-contrib/flox
Fast & furious GroupBy operations for dask.array
Language: Python - Size: 1.75 MB - Last synced at: 9 days ago - Pushed at: 19 days ago - Stars: 131 - Forks: 20

bioio-devs/bioio-ome-zarr
A BioIO reader plugin for reading Zarr files in the OME format.
Language: Python - Size: 208 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 2

itamarst/eliot
Eliot: the logging system that tells you *why* it happened
Language: Python - Size: 1.9 MB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 1,146 - Forks: 71

Euro-BioImaging/EuBI-Bridge
A Python-based tool for parallelized conversion of image datasets from various formats to OME-Zarr, with support for distributed processing and multi-dimensional concatenation.
Language: Python - Size: 2.79 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 0

UofU-Cryosphere/isnoda
HRRR-ISNOBAL model setup using a conda environment and utilities to compare it's output.
Language: HTML - Size: 13.3 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 10 - Forks: 3

scipp/sciline
Build scientific pipelines for your data
Language: Python - Size: 3.01 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 10 - Forks: 2

vre-hub/environments
VRE user environment images for workflows and notebooks
Language: C++ - Size: 3.67 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 2 - Forks: 3

lykmapipo/Python-Joblib-Cookbook
A step-by-step guide to master various aspects of Joblib for parallel computing in Python
Language: Python - Size: 44.9 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

jmcarpenter2/swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Language: Python - Size: 2.15 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2,611 - Forks: 104

pytroll/pyresample
Geospatial image resampling in Python
Language: Python - Size: 16.8 MB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 365 - Forks: 96

CoffeaTeam/coffea-casa
Repository with configuration setup of a prototype of analysis facility - "coffea-casa"
Language: Python - Size: 11.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 19

TGSAI/mdio-python
Cloud native, scalable storage engine for various types of energy data.
Language: Python - Size: 7.78 MB - Last synced at: 8 days ago - Pushed at: 18 days ago - Stars: 38 - Forks: 14

DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
Language: Python - Size: 11 MB - Last synced at: 22 days ago - Pushed at: 2 months ago - Stars: 349 - Forks: 47

tkp-archive/paperboy
A web frontend for scheduling Jupyter notebook reports
Language: Python - Size: 12.5 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 253 - Forks: 25

ranaroussi/pystore
Fast data store for Pandas time-series data
Language: Python - Size: 155 KB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 578 - Forks: 101

mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Language: Python - Size: 37 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2,731 - Forks: 327

ESDS-Leipzig/cubo
On-Demand Earth System Data Cubes (ESDCs) in Python
Language: Python - Size: 1.63 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 189 - Forks: 14

gjoseph92/stackstac
Turn a STAC catalog into a dask-based xarray
Language: Python - Size: 56.2 MB - Last synced at: 22 days ago - Pushed at: 10 months ago - Stars: 257 - Forks: 52

JiaweiZhuang/xESMF
Universal Regridder for Geospatial Data
Language: Python - Size: 2.79 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 278 - Forks: 48

lhoestq/pandas-image-methods
Image methods for pandas dataframes using Pillow
Language: Python - Size: 25.4 KB - Last synced at: 21 days ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

WilliamZhang20/Distributed-Computing
Implemented distributed computing algorithms
Language: Python - Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

NVIDIA-Merlin/models
Merlin Models is a collection of deep learning recommender system model reference implementations
Language: Python - Size: 113 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 278 - Forks: 51

aws-samples/amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
Language: Python - Size: 5.94 MB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 257 - Forks: 63

eoap/dask-app-package
CWL and Dask using calrissian
Language: Jupyter Notebook - Size: 2.44 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

dask-contrib/dask-sql
Distributed SQL Engine in Python using Dask
Language: Python - Size: 3.35 MB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 405 - Forks: 71

s-v-b/IFEBY310
Website for course IFEBY310 (Big Data Technologies) at Université Paris Cité
Language: HTML - Size: 83 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

meldig/pdal-parallelizer
A python app (cli/api) to parallelize your PDAL pipelines
Language: Python - Size: 150 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 3

wienkers/marEx
Marine Extremes detection, identification, and tracking/merging for Exascale Climate data
Language: Python - Size: 54.1 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

dask-contrib/dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
Language: Python - Size: 1.63 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 64 - Forks: 19

NCAR/ncar-jobqueue
Utilities for configuring dask-jobqueue with appropriate settings for NCAR clusters
Language: Python - Size: 152 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 14 - Forks: 4

eresearchqut/dasktest
Dask on Aqua using PBS
Language: Jupyter Notebook - Size: 223 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

bytehub-ai/bytehub
ByteHub: making feature stores simple
Language: Python - Size: 363 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 60 - Forks: 4

Descanonge/xarray-histogram
Compute histograms from XArray data using BoostHistogram
Language: Python - Size: 218 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 7 - Forks: 0

ds2-lab/Wukong
Wukong: A scalable and locality-enhanced serverless parallel framework (ACM SoCC'20)
Language: Python - Size: 14.6 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 74 - Forks: 16

google/xarray-beam
Distributed Xarray with Apache Beam
Language: Python - Size: 289 KB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 147 - Forks: 8

CNES/zcollection
Python library allowing to manipulate data split into a collection of groups stored in Zarr format.
Language: Python - Size: 1010 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 13 - Forks: 3

baggiponte/awesome-pandas-alternatives
Awesome list of alternative dataframe libraries in Python.
Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 2

iamtekson/geospatial-data-analysis-python
This repo contain the most common tools used in geospatial analysis using python!
Language: Jupyter Notebook - Size: 46.9 MB - Last synced at: 13 days ago - Pushed at: 12 months ago - Stars: 41 - Forks: 31

backtick-se/cowait
Containerized distributed programming framework for Python
Language: Python - Size: 5.69 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 53 - Forks: 4

ratt-ru/codex-africanus
Radio Astronomy Algorithms Library
Language: Python - Size: 1.52 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 20 - Forks: 10
