GitHub topics: datafusion
kwai/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Language: Rust - Size: 10.1 MB - Last synced at: about 7 hours ago - Pushed at: 3 days ago - Stars: 1,506 - Forks: 161

kamu-data/kamu-cli
Next-generation decentralized data lakehouse and a multi-party stream processing network
Language: Rust - Size: 37.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 324 - Forks: 15

jcsherin/datablok
Novel and high-performance applications of database building blocks (Apache DataFusion, Arrow & Parquet)
Language: Rust - Size: 528 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Language: Java - Size: 35.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2,882 - Forks: 410

apache/datafusion
Apache DataFusion SQL Query Engine
Language: Rust - Size: 147 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 7,527 - Forks: 1,554

datafusion-contrib/datafusion-dft
Batteries included CLI, TUI, and server implementations for DataFusion.
Language: Rust - Size: 15.8 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 160 - Forks: 15

arkflow-rs/arkflow
High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis.
Language: Rust - Size: 2.87 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,117 - Forks: 34

lakehq/sail
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
Language: Rust - Size: 4.79 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 856 - Forks: 44

roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Language: Rust - Size: 1.48 MB - Last synced at: 3 days ago - Pushed at: 29 days ago - Stars: 3,328 - Forks: 201

milenkovicm/adhesive
Apache Datafusion JVM User Defined Functions (UDF), integration nobody asked for 😀
Language: Rust - Size: 54.7 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 1

madesroches/micromegas
Scalable Observability
Language: Rust - Size: 3.24 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 25 - Forks: 5

ClickHouse/ClickBench
ClickBench: a Benchmark For Analytical Databases
Language: HTML - Size: 14.4 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 852 - Forks: 219

XiangpengHao/liquid-cache
Distributed pushdown cache for DataFusion
Language: Rust - Size: 6.42 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 196 - Forks: 21

ibis-project/ibis
the portable Python dataframe library
Language: Python - Size: 176 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5,934 - Forks: 657

JanKaul/iceberg-rust
Unofficial rust implementation of Apache Iceberg with integration for Datafusion
Language: Rust - Size: 4.9 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 207 - Forks: 30

apache/datafusion-comet
Apache DataFusion Comet Spark Accelerator
Language: Rust - Size: 18.3 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 994 - Forks: 225

biodatageeks/polars-bio
Blazing-Fast Bioinformatic Operations on Python DataFrames
Language: Python - Size: 9.08 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 68 - Forks: 22

nimtable/iceberg-compaction
Comptaction runtime for Apache Iceberg.
Language: Rust - Size: 765 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 56 - Forks: 8

datafusion-contrib/datafusion-materialized-views
Incremental view maintenance & query rewriting for materialized views in DataFusion
Language: Rust - Size: 90.8 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 40 - Forks: 3

datafusion-contrib/datafusion-functions-extra
Various additional function packages for Apache DataFusion (unofficial)
Language: Rust - Size: 52.7 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 9 - Forks: 6

milenkovicm/ballista_delta
Datafusion Ballista support for Delta Table (showcase project)
Language: Rust - Size: 497 KB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 3 - Forks: 1

tansu-io/tansu
Apache Kafka® compatible broker with S3, PostgreSQL, Apache Iceberg and Delta Lake
Language: Rust - Size: 3.41 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 405 - Forks: 11

systemxlabs/datafusion-remote-table
A DataFusion table provider for executing SQL queries on remote databases.
Language: Rust - Size: 467 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 8 - Forks: 3

duyet/ballista
Example of Ballista Rust
Language: Rust - Size: 35.7 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

datafusion-contrib/datafusion-postgres
Postgres protocol frontend for DataFusion
Language: Rust - Size: 379 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 71 - Forks: 15

splitgraph/seafowl
Analytical database for data-driven Web applications 🪶
Language: Rust - Size: 4.47 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 488 - Forks: 17

PRQL/prql-query 📦
Query and transform data with PRQL
Language: Rust - Size: 1.32 MB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 134 - Forks: 7

sal-openlab/datafusion-server
Rust DataFusion Server
Language: Rust - Size: 1.57 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 18 - Forks: 3

myryfe/dataframely
A declarative, 🐻❄️-native data frame validation library.
Language: Python - Size: 290 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

metrico/influxdb3-community
Community InfluxDB 3.0 "IOx" static builds + containers + Examples for Developers & Integrators. Experiment with low-cost storage, unlimited cardinality and FlightSQL APIs
Language: Shell - Size: 232 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 45 - Forks: 2

datafusion-contrib/datafusion-objectstore-s3
S3 as an ObjectStore for DataFusion
Language: Rust - Size: 73.2 KB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 64 - Forks: 14

apache/datafusion-benchmarks
Apache DataFusion Benchmarks
Language: Python - Size: 123 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 20 - Forks: 8

mrasu/dataharpoon
An MCP-ready query engine that connects to your data — wherever it lives
Language: Rust - Size: 139 KB - Last synced at: 13 days ago - Pushed at: 21 days ago - Stars: 3 - Forks: 0

shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Language: Jupyter Notebook - Size: 631 MB - Last synced at: about 13 hours ago - Pushed at: 2 months ago - Stars: 48 - Forks: 21

milenkovicm/ballista_python
Ballista cluster pyarrow udf support
Language: Rust - Size: 396 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

ModelarData/ModelarDB-RS
ModelarDB: Model-Based Time Series Management from Edge to Client
Language: Rust - Size: 1.65 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 14 - Forks: 5

GeorgeLeePatterson/clickhouse-arrow
ClickHouse Native Protocol Rust Client w/ Arrow Compatibility
Language: Rust - Size: 1.78 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 1 - Forks: 0

irtimmer/rust-kql
Kusto Query Language parser and planner for DataFusion
Language: Rust - Size: 120 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 6 - Forks: 0

FluoLab/datafusion
Fusing high-resolution volumes with spectro-temporal low-resolution images
Language: Jupyter Notebook - Size: 4.75 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 2 - Forks: 0

PragmaAI/yelp-datapipeline
🍽️ Yelp Data Pipeline & Analytics Dashboard End-to-end data engineering pipeline processing Yelp dataset with Rust transforms, Apache Airflow orchestration, and interactive Streamlit analytics. Features business insights, user engagement analysis, and city performance comparisons. 🚀 Docker-ready • 📊 Interactive Dashboard • ⚡ High-performance R
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

wheretrue/exon
Exon is an OLAP query engine specifically for biology and life science applications.
Language: Rust - Size: 59.3 MB - Last synced at: 27 days ago - Pushed at: 4 months ago - Stars: 66 - Forks: 7

baggiponte/awesome-pandas-alternatives
Awesome list of alternative dataframe libraries in Python.
Size: 21.5 KB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 52 - Forks: 2

MaciekLesiczka/azof
Lakehouse with time travel
Language: Rust - Size: 94.6 MB - Last synced at: 10 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

grouzen/zio-apache-arrow
Scala ZIO-powered Apache Arrow library
Language: Scala - Size: 496 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 21 - Forks: 1

clflushopt/datafusion-tpch
Native Rust TPCH support for Datafusion using tpchgen
Language: Rust - Size: 48.8 KB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 3

splitgraph/seafowl-gcsfuse
Scale to zero Seafowl hosting with Cloud Run
Language: Dockerfile - Size: 13.7 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 37 - Forks: 0

paradedb/pg_analytics 📦
DuckDB-powered data lake analytics from Postgres
Language: Rust - Size: 814 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 523 - Forks: 24

burhanahmed1/CryptoSynth
Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis
Language: Jupyter Notebook - Size: 3.54 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

milenkovicm/ballista_extensions
Extending datafusion ballista to support custom made logical and physical operators
Language: Rust - Size: 43.9 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

fmenat/MultiviewCropClassification
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 16 - Forks: 1

svenslaggare/gitrends
Web-based behavior code analysis tool.
Language: TypeScript - Size: 1.82 MB - Last synced at: 15 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

apache/datafusion-testing
Apache DataFusion SQL Query Engine Testing
Size: 191 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 5

surfingreg/rust-in-memory-db-with-chart
Implement a fast, in-memory, time-series database. Query data using SQL and visualize in ChartJS over websocket.
Language: Rust - Size: 358 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

milenkovicm/wasaffi 📦
Datafusion WASM User Defined Functions
Language: Rust - Size: 1.22 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 0

duo-rs/duo
A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.
Language: Rust - Size: 2.51 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 73 - Forks: 7

QizhiPei/MathFusion
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion
Language: Python - Size: 9.33 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

roeap/flight-fusion
Language: Rust - Size: 4.3 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 2

jorgecarleitao/datafusion-python 📦
A Python library to run analytics workloads with the performance of Rust, the flexibility of Python and O(1) cost in moving data between the two. Uses Apache Arrow in-memory format and respective query engine DataFusion.
Language: Rust - Size: 125 KB - Last synced at: 1 day ago - Pushed at: about 4 years ago - Stars: 61 - Forks: 4

milenkovicm/lightfusion
LightGBM Inference on Datafusion
Language: Rust - Size: 9.83 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

milenkovicm/torchfusion
Torchfusion is a very opinionated torch inference on datafusion.
Language: Rust - Size: 92.8 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

hienduyph/fusionj
An Incomplete DataFusion Query Engine implemeted in Java
Language: Java - Size: 281 KB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

metrico/kompactor
Parquet + Metadata Compactor for InfluxDB 3 Core
Language: TypeScript - Size: 150 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

lostmygithubaccount/ibis-bench
A composable data system benchmark in a Python package.
Language: Python - Size: 965 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 2 - Forks: 1

datafusion-contrib/datafusion-java
Java binding to Apache DataFusion
Language: Java - Size: 479 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 74 - Forks: 13

andriidemus/exo
Toy Data REPL
Language: Rust - Size: 116 KB - Last synced at: about 19 hours ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

duhanmin/arrow-sql-yarn
通过jni将sql执行到datafusion/polars引擎
Language: Java - Size: 26.9 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

hengfeiyang/how-query-engines-work-zh-CN
How Query Engines Work 中文版
Size: 1.77 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 2

hw2499/etl-engine
etl engine 轻量级 跨平台 流批一体ETL引擎 数据抽取-转换-装载 ETL engine lightweight cross platform batch flow integration ETL engine data extraction transformation loading
Language: Go - Size: 1.6 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 68 - Forks: 13

dadepo/df_extras
A collection of user defined functions, from your favourite databases, in Apache Datafusion
Language: Rust - Size: 161 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

splitgraph/experimental-datafusion-webassembly
proof-of-concept: compile datafusion to `wasm32-wasi` (run in `wasmedge`) and `wasm32-unknown-unknown` (run in browser)
Size: 104 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

DFKI-Earth-And-Space-Applications/MVCC_IGARSS
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

f-aguzzi/ChemFuseKit
Chemometrics library for data fusion, model training and prediction of data from multiple sensor sources.
Language: Jupyter Notebook - Size: 22.2 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

PyramidGithub/data_fusion
Apache Spark Comet Wsl2
Language: Rust - Size: 3.21 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

selvakrishnan/DataFusion_CDAP_Wrangler_Directives
Google Cloud Data Fusion - Data Transformation Logics using CDAP Wrangler Directives.
Size: 83 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

selvakrishnan/DataFusion_Airflow_Trigger
A simple dag for triggering the Cloud Data Fusion Pipeline using Apache Airflow.
Language: Python - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

rurumimic/apache-datafusion
Language: Rust - Size: 4.88 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

stormasm/dply-rs Fork of vincev/dply-rs
A dataframe manipulation tool inspired by dplyr.
Language: Rust - Size: 508 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Mboubaker/Lidar_Evidential_occupancy_grid_mapping-
This reposity present an approach to build 2D evidential occupancy grid maps with Lidar data
Language: Jupyter Notebook - Size: 7.45 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

jychen7/datafusion-bigtable 📦
Bigtable data source for Apache Arrow Datafusion
Language: Rust - Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

datafusion-contrib/datafusion-c
C language bindings for DataFusion
Language: C - Size: 5.75 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 3

dsaad68/azurefunction-deltatable-pipeline-with-rust
A Delta Table pipeline in Rust, triggered by Azure Functions responding to blob storage events in a specific container subfolder. The pipeline processes CSV files, updating or creating Delta Tables as needed, using merges for row changes.
Language: Rust - Size: 1.05 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jfrazier-eth/file-fusion
A file explorer for data warehouses
Language: TypeScript - Size: 1.07 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

zhxiaogg/dfq
A CLI for running SQLs over various data sources.
Language: Rust - Size: 26.4 KB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Caoxuheng/HyMS
Hyperspectral Image Super-resolution via Multi-stage Scheme without Employing Spatial Degradation
Language: Python - Size: 91.8 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 3

matsadler/bishop
Query MongoDB via Apache Arrow and DataFusion
Language: Rust - Size: 37.1 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

blaze-init/spark-blaze-extension 📦
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Language: Shell - Size: 288 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 4

IFF-0303/lvh-fusion Fork of AshleyLab/lvh-fusion
Size: 65.4 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

datafusion-contrib/datafusion-objectstore-azure 📦
Azure Storage as an ObjectStore for DataFusion
Language: Rust - Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

treebee/elixir-arrow
Experimental Elixir bindings for Apache Arrow including Parquet and DataFusion
Language: Rust - Size: 130 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 27 - Forks: 3

datafusion-contrib/datafusion-python 📦
Python binding for DataFusion
Language: Python - Size: 234 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 59 - Forks: 13

Georgsiedel/data_fusion_dipl
Code featured in the diploma thesis, which is in german language. Full text available with link in the Readme below.
Language: R - Size: 124 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

gongouveia/DataFusion-2021-22
Time series analysis, state estimation, stratification, classification and data mining
Language: Jupyter Notebook - Size: 8.03 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

sscosta/datafusion-on-demand
Set of Airflow DAGs to create and destroy a Cloud Data Fusion instance
Language: Python - Size: 4.88 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

jychen7/BigQL
SQL Query Layer for Google Cloud Bigtable
Language: Python - Size: 110 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

ivangonzalezacuna/datafusion_collect_transform_data
Functions for the main process to collect and store the data received via MQTT and transform all the entries of each sensor in one
Language: Go - Size: 1.58 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0
