GitHub topics: datafusion
milenkovicm/adhesive
Apache Datafusion JVM User Defined Functions (UDF), integration nobody asked for 😀
Language: Rust - Size: 55.7 KB - Last synced at: about 17 hours ago - Pushed at: about 20 hours ago - Stars: 5 - Forks: 1
ClickHouse/ClickBench
ClickBench: a Benchmark For Analytical Databases
Language: HTML - Size: 19.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 901 - Forks: 237
biodatageeks/polars-bio
Blazing-Fast Bioinformatic Operations on Python DataFrames
Language: Python - Size: 33.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 100 - Forks: 25
systemxlabs/datafusion-remote-table
A DataFusion table provider for executing SQL on remote databases.
Language: Rust - Size: 728 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 12 - Forks: 3
datafusion-contrib/datafusion-postgres
Postgres protocol frontend for DataFusion
Language: Rust - Size: 1.34 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 106 - Forks: 22
kamu-data/kamu-cli
Next-generation decentralized data lakehouse and a multi-party stream processing network
Language: Rust - Size: 39.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 333 - Forks: 15
XiangpengHao/liquid-cache
Distributed pushdown cache for DataFusion
Language: Rust - Size: 12.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 304 - Forks: 33
lakehq/sail
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
Language: Rust - Size: 6.47 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,021 - Forks: 60
nimtable/iceberg-compaction
Compaction runtime for Apache Iceberg.
Language: Rust - Size: 854 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 104 - Forks: 11
GeorgeLeePatterson/clickhouse-datafusion
ClickHouse in DataFusion!
Language: Rust - Size: 495 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 12 - Forks: 2
arkflow-rs/arkflow
High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis.
Language: Rust - Size: 2.69 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,203 - Forks: 39
apache/datafusion-comet
Apache DataFusion Comet Spark Accelerator
Language: Scala - Size: 25.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,056 - Forks: 247
apache/datafusion-sandbox
DataFusion Test Sandbox
Language: Rust - Size: 135 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0
apache/auron
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing
Language: Rust - Size: 11 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,629 - Forks: 187
apache/datafusion
Apache DataFusion SQL Query Engine
Language: Rust - Size: 161 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 7,939 - Forks: 1,712
ibis-project/ibis
the portable Python dataframe library
Language: Python - Size: 180 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6,178 - Forks: 680
tansu-io/tansu
Apache Kafka® compatible broker with S3, PostgreSQL, SQLite, Apache Iceberg and Delta Lake
Language: Rust - Size: 3.81 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 511 - Forks: 18
JanKaul/iceberg-rust
Unofficial rust implementation of Apache Iceberg with integration for Datafusion
Language: Rust - Size: 5.18 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 222 - Forks: 34
qdeli187/apache-airflow-providers-apache-datafusion-ballista
🍃 Run Apache Datafusion Ballista workflows within Airflow
Language: Python - Size: 108 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
madesroches/micromegas
Scalable Observability
Language: Rust - Size: 43.4 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 33 - Forks: 5
influxdata/datafusion-udf-wasm
DataFusion UDFs (User Defined Functions) via WebAssembly
Language: Rust - Size: 396 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0
lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Language: Java - Size: 36.2 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3,049 - Forks: 411
datafusion-contrib/datafusion-extra-functions
Various additional function packages for Apache DataFusion (unofficial)
Language: Rust - Size: 112 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 9 - Forks: 7
milenkovicm/ballista_delta
Datafusion Ballista support for Delta Table (showcase project)
Language: Rust - Size: 358 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 5 - Forks: 2
datafusion-contrib/datafusion-c
C language bindings for DataFusion
Language: C - Size: 5.78 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 21 - Forks: 6
splitgraph/seafowl
Analytical database for data-driven Web applications 🪶
Language: Rust - Size: 4.47 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 499 - Forks: 18
datafusion-contrib/datafusion-materialized-views
Incremental view maintenance & query rewriting for materialized views in DataFusion
Language: Rust - Size: 109 KB - Last synced at: 11 days ago - Pushed at: 22 days ago - Stars: 49 - Forks: 5
GeorgeLeePatterson/clickhouse-arrow
ClickHouse Native Protocol Rust Client w/ Arrow Compatibility
Language: Rust - Size: 1.22 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 35 - Forks: 7
baggiponte/awesome-pandas-alternatives
Awesome list of alternative dataframe libraries in Python.
Size: 21.5 KB - Last synced at: 5 days ago - Pushed at: almost 3 years ago - Stars: 54 - Forks: 2
sal-openlab/datafusion-server
Rust DataFusion Server
Language: Rust - Size: 1.64 MB - Last synced at: 14 days ago - Pushed at: 17 days ago - Stars: 23 - Forks: 3
apache/datafusion-testing
Apache DataFusion SQL Query Engine Testing
Size: 279 MB - Last synced at: 5 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 6
apache/datafusion-benchmarks
Apache DataFusion Benchmarks
Language: Python - Size: 124 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 22 - Forks: 11
duyet/ballista
Example of Ballista Rust
Language: Rust - Size: 35.8 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0
datafusion-contrib/datafusion-dft
Batteries included CLI, TUI, and server implementations for DataFusion.
Language: Rust - Size: 15.9 MB - Last synced at: 12 days ago - Pushed at: 15 days ago - Stars: 165 - Forks: 19
ModelarData/ModelarDB-RS
ModelarDB: Model-Based Time Series Management from Edge to Client
Language: Rust - Size: 1.89 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 17 - Forks: 5
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Language: Rust - Size: 1.48 MB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 3,348 - Forks: 205
milenkovicm/ballista_python
Ballista cluster pyarrow udf support
Language: Rust - Size: 307 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 2 - Forks: 0
datafusion-contrib/datafusion-objectstore-s3
S3 as an ObjectStore for DataFusion
Language: Rust - Size: 73.2 KB - Last synced at: 22 days ago - Pushed at: over 2 years ago - Stars: 66 - Forks: 14
SemyonSinchenko/graphframes-rs
GraphFrames but in DataFusion
Language: Rust - Size: 81.1 KB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1
withterm/term
Lightning-fast data validation for Rust. Built on Arrow/DataFusion with OpenTelemetry observability.
Language: Rust - Size: 771 KB - Last synced at: 22 days ago - Pushed at: 24 days ago - Stars: 26 - Forks: 0
wheretrue/exon
Exon is an OLAP query engine specifically for biology and life science applications.
Language: Rust - Size: 59.3 MB - Last synced at: about 18 hours ago - Pushed at: 7 months ago - Stars: 70 - Forks: 7
jcsherin/datablok
Novel and high-performance applications of database building blocks (Apache DataFusion, Arrow & Parquet)
Language: Rust - Size: 26.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
PRQL/prql-query 📦
Query and transform data with PRQL
Language: Rust - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 137 - Forks: 7
MaciekLesiczka/azof
Lakehouse with time travel
Language: Rust - Size: 94.6 MB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0
shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Language: Jupyter Notebook - Size: 631 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 50 - Forks: 21
lyteabovenyte/RDE
Power Your Data Pipeline with Rust-Data-Engineering
Language: Rust - Size: 734 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
milenkovicm/torchfusion
Torchfusion is a very opinionated torch inference on datafusion.
Language: Rust - Size: 93.8 KB - Last synced at: 25 days ago - Pushed at: 6 months ago - Stars: 5 - Forks: 0
james-ralph8555/1brc
1brc https://www.morling.dev/blog/one-billion-row-challenge/
Language: Shell - Size: 751 KB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
myryfe/dataframely
A declarative, 🐻❄️-native data frame validation library.
Language: Python - Size: 290 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0
milenkovicm/ballista_extensions
Extending datafusion ballista to support custom made logical and physical operators
Language: Rust - Size: 53.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
milenkovicm/lightfusion
LightGBM Inference on Datafusion
Language: Rust - Size: 9.81 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
irtimmer/rust-kql
Kusto Query Language parser and planner for DataFusion
Language: Rust - Size: 141 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1
refluxdb/influxdb3-community
Unlocked InfluxDB 3.0 "IOx" Community Builds + Examples for Developers & Integrators. Experiment with low-cost storage, unlimited cardinality and FlightSQL APIs
Language: Shell - Size: 285 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 49 - Forks: 2
mrasu/dataharpoon
An MCP-ready query engine that connects to your data — wherever it lives
Language: Rust - Size: 139 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0
FluoLab/datafusion
Fusing high-resolution volumes with spectro-temporal low-resolution images
Language: Jupyter Notebook - Size: 4.75 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0
PragmaAI/yelp-datapipeline
🍽️ Yelp Data Pipeline & Analytics Dashboard End-to-end data engineering pipeline processing Yelp dataset with Rust transforms, Apache Airflow orchestration, and interactive Streamlit analytics. Features business insights, user engagement analysis, and city performance comparisons. 🚀 Docker-ready • 📊 Interactive Dashboard • ⚡ High-performance R
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
grouzen/zio-apache-arrow
Scala ZIO-powered Apache Arrow library
Language: Scala - Size: 496 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 21 - Forks: 1
clflushopt/datafusion-tpch
Native Rust TPCH support for Datafusion using tpchgen
Language: Rust - Size: 48.8 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 3
splitgraph/seafowl-gcsfuse
Scale to zero Seafowl hosting with Cloud Run
Language: Dockerfile - Size: 13.7 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 0
paradedb/pg_analytics 📦
DuckDB-powered data lake analytics from Postgres
Language: Rust - Size: 814 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 523 - Forks: 24
burhanahmed1/CryptoSynth
Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis
Language: Jupyter Notebook - Size: 3.54 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1
fmenat/MultiviewCropClassification
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 16 - Forks: 1
svenslaggare/gitrends
Web-based behavior code analysis tool.
Language: TypeScript - Size: 1.82 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
surfingreg/rust-in-memory-db-with-chart
Implement a fast, in-memory, time-series database. Query data using SQL and visualize in ChartJS over websocket.
Language: Rust - Size: 358 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
milenkovicm/wasaffi 📦
Datafusion WASM User Defined Functions
Language: Rust - Size: 1.22 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 0
duo-rs/duo
A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.
Language: Rust - Size: 2.51 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 73 - Forks: 7
QizhiPei/MathFusion
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion
Language: Python - Size: 9.33 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
roeap/flight-fusion
Language: Rust - Size: 4.3 MB - Last synced at: 7 months ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 2
jorgecarleitao/datafusion-python 📦
A Python library to run analytics workloads with the performance of Rust, the flexibility of Python and O(1) cost in moving data between the two. Uses Apache Arrow in-memory format and respective query engine DataFusion.
Language: Rust - Size: 125 KB - Last synced at: 22 days ago - Pushed at: over 4 years ago - Stars: 61 - Forks: 4
hienphamlabs/fusionj
An Incomplete DataFusion Query Engine implemeted in Java
Language: Java - Size: 281 KB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0
metrico/kompactor
Parquet + Metadata Compactor for InfluxDB 3 Core
Language: TypeScript - Size: 150 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0
lostmygithubaccount/ibis-bench
A composable data system benchmark in a Python package.
Language: Python - Size: 965 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 1
datafusion-contrib/datafusion-java
Java binding to Apache DataFusion
Language: Java - Size: 479 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 74 - Forks: 13
andriidemus/exo
Toy Data REPL
Language: Rust - Size: 116 KB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0
duhanmin/arrow-sql-yarn
通过jni将sql执行到datafusion/polars引擎
Language: Java - Size: 26.9 MB - Last synced at: 5 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0
hengfeiyang/how-query-engines-work-zh-CN
How Query Engines Work 中文版
Size: 1.77 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2
hw2499/etl-engine
etl engine 轻量级 跨平台 流批一体ETL引擎 数据抽取-转换-装载 ETL engine lightweight cross platform batch flow integration ETL engine data extraction transformation loading
Language: Go - Size: 1.6 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 68 - Forks: 13
dadepo/df_extras
A collection of user defined functions, from your favourite databases, in Apache Datafusion
Language: Rust - Size: 161 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
splitgraph/experimental-datafusion-webassembly
proof-of-concept: compile datafusion to `wasm32-wasi` (run in `wasmedge`) and `wasm32-unknown-unknown` (run in browser)
Size: 104 KB - Last synced at: 12 days ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 1
DFKI-Earth-And-Space-Applications/MVCC_IGARSS
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1
f-aguzzi/ChemFuseKit
Chemometrics library for data fusion, model training and prediction of data from multiple sensor sources.
Language: Jupyter Notebook - Size: 22.2 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1
PyramidGithub/data_fusion
Apache Spark Comet Wsl2
Language: Rust - Size: 3.21 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
selvakrishnan/DataFusion_CDAP_Wrangler_Directives
Google Cloud Data Fusion - Data Transformation Logics using CDAP Wrangler Directives.
Size: 83 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0
selvakrishnan/DataFusion_Airflow_Trigger
A simple dag for triggering the Cloud Data Fusion Pipeline using Apache Airflow.
Language: Python - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0
rurumimic/apache-datafusion
Language: Rust - Size: 4.88 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
stormasm/dply-rs Fork of vincev/dply-rs
A dataframe manipulation tool inspired by dplyr.
Language: Rust - Size: 508 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
Mboubaker/Lidar_Evidential_occupancy_grid_mapping-
This reposity present an approach to build 2D evidential occupancy grid maps with Lidar data
Language: Jupyter Notebook - Size: 7.45 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1
jychen7/datafusion-bigtable 📦
Bigtable data source for Apache Arrow Datafusion
Language: Rust - Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0
dsaad68/azurefunction-deltatable-pipeline-with-rust
A Delta Table pipeline in Rust, triggered by Azure Functions responding to blob storage events in a specific container subfolder. The pipeline processes CSV files, updating or creating Delta Tables as needed, using merges for row changes.
Language: Rust - Size: 1.05 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
jfrazier-eth/file-fusion
A file explorer for data warehouses
Language: TypeScript - Size: 1.07 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0
zhxiaogg/dfq
A CLI for running SQLs over various data sources.
Language: Rust - Size: 26.4 KB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0
Caoxuheng/HyMS
Hyperspectral Image Super-resolution via Multi-stage Scheme without Employing Spatial Degradation
Language: Python - Size: 91.8 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 3
matsadler/bishop
Query MongoDB via Apache Arrow and DataFusion
Language: Rust - Size: 37.1 KB - Last synced at: 8 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0
blaze-init/spark-blaze-extension 📦
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Language: Shell - Size: 288 MB - Last synced at: 7 months ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 4
IFF-0303/lvh-fusion Fork of AshleyLab/lvh-fusion
Size: 65.4 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
datafusion-contrib/datafusion-objectstore-azure 📦
Azure Storage as an ObjectStore for DataFusion
Language: Rust - Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1
treebee/elixir-arrow
Experimental Elixir bindings for Apache Arrow including Parquet and DataFusion
Language: Rust - Size: 130 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 27 - Forks: 3
datafusion-contrib/datafusion-python 📦
Python binding for DataFusion
Language: Python - Size: 234 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 59 - Forks: 13
Georgsiedel/data_fusion_dipl
Code featured in the diploma thesis, which is in german language. Full text available with link in the Readme below.
Language: R - Size: 124 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
gongouveia/DataFusion-2021-22
Time series analysis, state estimation, stratification, classification and data mining
Language: Jupyter Notebook - Size: 8.03 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0