An open API service providing repository metadata for many open source software ecosystems.

Topic: "dask-dataframes"

dask-contrib/dask-deltatable

A Delta Lake reader for Dask

Language: Python - Size: 274 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 53 - Forks: 17

momentoscope/hextof-processor

Code for preprocessing data from the HEXTOF instrument at FLASH, DESY in Hamburg (DE)

Language: Jupyter Notebook - Size: 12.7 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 4

sbl-sdsc/df-parallel

Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.

Language: Jupyter Notebook - Size: 3.33 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 3

Circadiaware/circalizer

Flexible stacked visualization of circadian data from multiple sources and devices

Language: Jupyter Notebook - Size: 5.49 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

maltzsama/sumeh

Sumeh — Unified Data Quality Framework Sumeh is a unified data quality validation framework supporting multiple backends (PySpark, Dask, Polars, DuckDB, Pandas) with centralized rule configuration.

Language: Python - Size: 1.69 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

pranjalpruthi/bhedi

βHΞDI (Biomarker-based Heuristic Engine for Dengue Identification) is a computational tool designed for the identification of Dengue virus serotypes in wastewater next-generation sequencing data.

Language: Go - Size: 6.76 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

MBrugnaroto/Trips-Data-Pipeline

This repository presents a simple streaming data pipeline to get statistics per vehicle from a data source.

Language: Python - Size: 42.2 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

fares-ds/pandas

Full data analysis and data visualization projects notebooks using Pandas, Numpy, matplotlib and seaborn

Language: Jupyter Notebook - Size: 8.07 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

rishigoutam/nycdemo

Code for a talk on wrangling large datasets in pandas

Language: Jupyter Notebook - Size: 115 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 3

IncubatorShokuhou/dask-tutorial Fork of dask/dask-tutorial

Dask tutorial;Dask汉化教程

Language: Jupyter Notebook - Size: 87.2 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

somjit101/NYC-Taxi-Demand-Prediction

This is a Time Series Forecasting and Regression solution to project the no. of pick-ups at and around a given region at a given time in the city of New York, USA.

Language: Jupyter Notebook - Size: 12.5 MB - Last synced at: 5 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

kaladabrio2020/HELPS_

Language: Jupyter Notebook - Size: 1.58 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

HamedAlemo/dask-tutorial

A tutorial to learn Dask DataArray and Dask DataFrames with examples from geospatial data catalogs.

Language: Jupyter Notebook - Size: 18.5 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Ramy-Badr-Ahmed/Higgs-Dataset-Training

Training Higgs Dataset with Keras - https://doi.org/10.5281/zenodo.13133945

Language: Python - Size: 3.46 MB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Navya0203/BigData-Recommender-System-along-with-Sentiment-Analysis-Using-Dask-and-Pyspark

This repository develops an advanced recommendation system to enhance the e-commerce shopping experience by automating product suggestions and analyzing user preferences through machine learning techniques and big data technologies.

Language: Jupyter Notebook - Size: 588 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

D3struf/Distributed-Collaborative-Filtering-Book-Recommendation-System

It uses Dask as a Distributed Framework and Inspired by the work of https://github.com/entbappy/ML-Based-Book-Recommender-System

Language: Jupyter Notebook - Size: 370 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jackmadden246/Toothbrush

This is a Xander project involving a Python script containing toothbrush sales data , which is scheduled on a cloud environment

Language: Python - Size: 7.82 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

stivenramireza/pocs

POCs in order to explore new technologies.

Language: Python - Size: 675 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

sarmad9987/Machine-learning-with-Dask

The following project shows and compares machine learning between Pandas DataFrames and Dask Dataframes.

Language: Jupyter Notebook - Size: 107 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

nafisa-samia/Analysis-of-Crimes-in-Chicago-using-Dask Fork of kuntala-c/Crimes-Analysis-in-Chicago-using-Dask

Data Analysis on an extensive dataset of crimes in Chicago (2005 - 2016) using Dask

Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

kuntala-c/Crimes-Analysis-in-Chicago-using-Dask

Data Analysis on an extensive dataset of crimes in Chicago (2005-2016) using Dask

Language: Jupyter Notebook - Size: 2.26 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 3

Related Topics
dask 12 python 7 data-visualization 5 pandas 4 data-science 3 dask-distributed 3 dask-array 3 scikit-learn 2 pyspark 2 cuda-toolkit 2 pyspark-dataframes 2 s3-bucket 2 data-analysis 2 pandas-dataframe 2 machine-learning 2 data 2 matplotlib 2 numpy 2 dask-ml 2 random-forest-regression 1 time-series-analysis 1 time-series-forecasting 1 xgboost-regression 1 uci-machine-learning 1 uci-dataset 1 dask-cudf 1 dataframes 1 gpu-computing 1 parallel-processing 1 rapidsai 1 arpes 1 performance-analysis 1 new-york-taxi-demand-prediction 1 k-means-clustering 1 fourier-transform 1 fft-analysis 1 exponential-moving-average 1 data-cleaning 1 clustering 1 baseline-model 1 plotly 1 parallel-programming 1 matplotlib-pyplot 1 dash 1 sql 1 postgres 1 keras 1 higgs-boson 1 cupy 1 binary-classification 1 polars-extensions 1 polars-dataframe 1 polars 1 pandas-library 1 keras-tensorflow 1 duckdb-extension 1 matplotlib-python 1 duckdb 1 data-quality-report 1 data-quality-measurement 1 data-quality-framework 1 data-quality-checks 1 data-quality-assessment 1 data-quality-analysis 1 data-quality 1 ultrafast-spectroscopy 1 solid-state-physics 1 photoemission 1 pes 1 mpes 1 materials-science 1 free-electron-laser 1 fel 1 distributed-processing 1 condensed-matter-physics 1 fastapi 1 dagster 1 cassandra 1 blockchain 1 beautifulsoup 1 apache-airflow 1 python3 1 php 1 mariadb-database 1 ec2-instance 1 cronjob 1 cloudwatch 1 aws-lambda 1 geospatial-data 1 geospatial-analysis 1 geospatial 1 delayed 1 chinese-translation 1 visualizations 1 sleep-tracker 1 sleep-research 1 sleep-cycles 1 datashader 1 circadian-rhythm 1 circadian-data 1