Topic: "apache-arrow"
pixie-io/pixie
Instant Kubernetes-Native Application Observability
Language: C++ - Size: 114 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 5,965 - Forks: 463

lancedb/lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Language: Rust - Size: 21.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4,515 - Forks: 285

aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Language: Python - Size: 17.1 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 4,005 - Forks: 705

polarsignals/frostdb
❄️ Coolest database around 🧊 Embeddable column database written in Go.
Language: Go - Size: 14.2 MB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 1,407 - Forks: 67

scikit-hep/awkward
Manipulate JSON-like data with NumPy-like idioms.
Language: Python - Size: 26.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 874 - Forks: 90

visgl/loaders.gl
Loaders for big data visualization. Website:
Language: TypeScript - Size: 293 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 745 - Forks: 203

developmentseed/lonboard
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
Language: Python - Size: 133 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 736 - Forks: 39

geopolars/geopolars
Geospatial extensions for Polars
Language: Rust - Size: 5.93 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 695 - Forks: 24

unum-cloud/ustore
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️
Language: C++ - Size: 6.56 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 587 - Forks: 34

kylebarron/parquet-wasm
Rust-based WebAssembly bindings to read and write Apache Parquet data
Language: Rust - Size: 2.5 MB - Last synced at: 10 days ago - Pushed at: 14 days ago - Stars: 569 - Forks: 20

geoarrow/geoarrow
Specification for storing geospatial data in Apache Arrow
Size: 63.5 KB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 456 - Forks: 25

1duo/awesome-ai-infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Size: 11.8 MB - Last synced at: 12 days ago - Pushed at: almost 6 years ago - Stars: 411 - Forks: 73

geoarrow/geoarrow-rs
GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations
Language: Rust - Size: 13.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 317 - Forks: 26

apache/arrow-julia
Official Julia implementation of Apache Arrow
Language: Julia - Size: 2.04 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 289 - Forks: 64

nevi-me/rust-dataframe 📦
A Rust DataFrame implementation, built on Apache Arrow
Language: Rust - Size: 253 KB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 281 - Forks: 20

cldellow/sqlite-parquet-vtable
A SQLite vtable extension to read Parquet files
Language: C++ - Size: 404 KB - Last synced at: 6 days ago - Pushed at: almost 4 years ago - Stars: 271 - Forks: 31

abs-tudelft/fletcher
Fletcher: A framework to integrate FPGA accelerators with Apache Arrow
Language: VHDL - Size: 8.04 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 225 - Forks: 31

scikit-hep/awkward-0.x 📦
Manipulate arrays of complex data structures as easily as Numpy.
Language: Python - Size: 6.42 MB - Last synced at: 20 days ago - Pushed at: about 4 years ago - Stars: 214 - Forks: 39

G-Research/ParquetSharp
ParquetSharp is a .NET library for reading and writing Apache Parquet files.
Language: C# - Size: 1.73 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 199 - Forks: 52

google/space
Unified storage framework for the entire machine learning lifecycle
Language: Python - Size: 825 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 156 - Forks: 8

apache/arrow-go
Official Go implementation of Apache Arrow
Language: Assembly - Size: 19.1 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 154 - Forks: 27

nanoporetech/pod5-file-format
Pod5: a high performance file format for nanopore reads.
Language: C++ - Size: 28.7 MB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 147 - Forks: 20

mattf96s/QuackDB
Open-source in-browser DuckDB SQL editor
Language: TypeScript - Size: 3.6 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 145 - Forks: 7

kylebarron/arro3
A minimal Python library for Apache Arrow, connecting to the Rust arrow crate
Language: Rust - Size: 3.3 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 140 - Forks: 11

geoarrow/deck.gl-layers
deck.gl layers for rendering GeoArrow data
Language: TypeScript - Size: 2.59 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 117 - Forks: 8

kylebarron/arrow-js-ffi
Zero-copy reading of Arrow data from WebAssembly
Language: TypeScript - Size: 360 KB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 115 - Forks: 9

mongodb-labs/mongo-arrow
MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
Language: Python - Size: 505 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 102 - Forks: 16

cmudig/falcon-vis Fork of vega/falcon
Cross-filter millions (or even billions) of data entries with no interaction delay
Language: Jupyter Notebook - Size: 131 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 99 - Forks: 2

igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
Language: Java - Size: 6.06 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 92 - Forks: 8

man-group/sparrow
C++20 idiomatic APIs for the Apache Arrow Columnar Format
Language: C++ - Size: 1.46 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 85 - Forks: 18

duo-rs/duo
A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.
Language: Rust - Size: 2.51 MB - Last synced at: 26 days ago - Pushed at: 7 months ago - Stars: 73 - Forks: 7

abdenlab/oxbow
Read specialized NGS formats as data frames in R, Python, and more.
Language: Rust - Size: 15.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 70 - Forks: 8

red-data-tools/red_amber
A dataframe library for Rubyists.
Language: Ruby - Size: 5.25 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 70 - Forks: 14

cldellow/csv2parquet
Convert a CSV to a parquet file.
Language: Python - Size: 97.7 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 64 - Forks: 14

elixir-explorer/adbc
Apache Arrow ADBC bindings for Elixir
Language: C++ - Size: 4.92 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 63 - Forks: 17

UWHustle/hustle
In-memory, columnar, arrow-based database.
Language: C++ - Size: 13.8 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 46 - Forks: 7

baggiponte/awesome-pandas-alternatives
Awesome list of alternative dataframe libraries in Python.
Size: 21.5 KB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 44 - Forks: 3

apache/arrow-java
Official Java implementation of Apache Arrow
Language: Java - Size: 23.8 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 40 - Forks: 41

influxdata/flightsql-dbapi
DB API 2 interface for Flight SQL with SQLAlchemy extras.
Language: Python - Size: 188 KB - Last synced at: 1 day ago - Pushed at: 27 days ago - Stars: 38 - Forks: 5

neo4j-product-examples/ds-graphconnect-2022-demo
Language: Jupyter Notebook - Size: 337 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 38 - Forks: 4

tradewelltech/beavers
Python stream processing for analytics
Language: Python - Size: 591 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 37 - Forks: 2

animeshtrivedi/ArrowExample
Java read and write example for Apache Arrow
Language: Java - Size: 56.6 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 33 - Forks: 11

geoarrow/geoarrow-js
TypeScript implementation of GeoArrow
Language: TypeScript - Size: 308 KB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 28 - Forks: 6

tradewelltech/protarrow
Convert from protobuf to arrow and back
Language: Python - Size: 9.06 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 25 - Forks: 3

kylebarron/arrow-wasm
Building block library for using Apache Arrow in Rust WebAssembly modules.
Language: Rust - Size: 272 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 22 - Forks: 5

JosiahParry/arrow-extendr
Integration between arrow-rs and extendr
Language: Rust - Size: 66.4 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 22 - Forks: 2

webysther/aws-glue-docker 📦
🐋 Docker image for AWS Glue Spark/Python
Language: Dockerfile - Size: 56.6 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 22 - Forks: 8

spirom/arrow-simpledb
Query processing for an extremely simple, in-memory, columnar database using Apache Arrow to represent tables
Language: C++ - Size: 190 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 22 - Forks: 5

grouzen/zio-apache-arrow
Scala ZIO-powered Apache Arrow library
Language: Scala - Size: 482 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 21 - Forks: 1

sonhmai/how-sqlite-works
A Book about how SQLite works. Rewriting SQLite in Rust for Learning and Fun and writing a book I wished I had when started.
Language: Rust - Size: 16 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 21 - Forks: 1

mbrobbel/narrow
An experimental (work-in-progress) statically typed implementation of Apache Arrow
Language: Rust - Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 19 - Forks: 5

kszucs/firebolt
Arrow implementation in Mojo
Language: Mojo - Size: 21.5 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 19 - Forks: 1

qwshen/spark-flight-connector
A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL
Language: Java - Size: 163 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 19 - Forks: 3

rpy2/rpy2-arrow
Share Apache Arrow datasets between Python and R.
Language: Python - Size: 664 KB - Last synced at: about 8 hours ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 3

graphext/lector
A fast reader for messy CSV files with optional type inference.
Language: Python - Size: 239 KB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 0

madesroches/micromegas
Scalable Observability
Language: Rust - Size: 2.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 16 - Forks: 4

amoeba/QLArrow
WIP QuickLook plugin for Apache Arrow and Parquet
Language: C - Size: 23 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 16 - Forks: 1

renesugar/FileConvert
Converts between file formats such as CSV and Parquet
Language: C - Size: 3.65 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 14 - Forks: 1

ModelarData/ModelarDB-RS
ModelarDB: Model-Based Time Series Management from Edge to Client
Language: Rust - Size: 1.56 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 13 - Forks: 5

datafusion-contrib/datafusion-c
C language bindings for DataFusion
Language: C - Size: 5.75 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 3

kat-co/cl-apache-arrow
This is a library for working with Apache Arrow and Parquet data.
Language: Common Lisp - Size: 51.8 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 2

Benjamin-Philip/serde_arrow
Serialization and deserialization to Apache Arrow for Erlang
Language: Erlang - Size: 158 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 11 - Forks: 1

Desdaemon/polars_dart
Dart bindings for the polars library
Language: Dart - Size: 968 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 1

cldellow/parquet-metadata
Dump metadata about a Parquet file.
Language: Python - Size: 39.1 KB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 3

arkady-emelyanov/pyarrow-flight
Apache Arrow Flight example
Language: Python - Size: 1000 Bytes - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 3

cpg314/polarhouse
Interoperability between Polars and Clickhouse
Language: Rust - Size: 87.9 KB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

unum-cloud/udsb
Unlimited Data-Science Benchmarks for Numeric, Tabular and Graph Workloads
Language: Jupyter Notebook - Size: 3.57 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

ljishen/bitar
Simplify accessing hardware compression/decompression accelerators
Language: C++ - Size: 541 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 2

perspective-community/arrow-wasm-cpp 📦
Standalone Apache Arrow compiled to WebAssembly, extracted from https://github.com/finos/perspective
Language: CMake - Size: 88.9 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

amoeba/arrow-python-js-ipc-example
Example showing how to send Arrow RecordBatches from a Python backend to a web browser.
Language: JavaScript - Size: 26.4 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

poopoothegorilla/fastframe
DataFrame project that utilizes Apache Arrow
Language: Go - Size: 218 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 7 - Forks: 0

rupurt/zodbc
A blazing fast ODBC Zig client
Language: Zig - Size: 125 KB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 2

Sebastiaan-Alvarez-Rodriguez/arrow-spark-publication
Implementation connecting Arrow to Spark, effectively making all code related to reading in Spark redundant.
Language: C++ - Size: 9.12 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 6 - Forks: 4

spaghettifunk/norman
Realtime distributed OLAP datastore, designed to answer OLAP queries with low latency written in Go. In Active development
Language: Go - Size: 370 KB - Last synced at: 9 days ago - Pushed at: 26 days ago - Stars: 5 - Forks: 0

lykmapipo/Python-Spark-Log-Analysis
Python scripts to process, and analyze log files using PySpark.
Language: Python - Size: 131 KB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

animeshtrivedi/benchmarking-arrow
Benchmarking Arrow/Java
Language: Java - Size: 226 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

mluttikh/xml2arrow
Efficiently convert XML data to Apache Arrow format for high-performance data processing
Language: Rust - Size: 224 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

cupiddb/cupiddb
In-memory Columnar Database
Language: Rust - Size: 49.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

lykmapipo/NYC-TLC-Trip-Data
Python scripts to download, process, and analyze the New York City Taxi and Limousine Commission (TLC) Trip Record Data dataset
Language: Jupyter Notebook - Size: 100 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 4 - Forks: 1

roeap/flight-sql-client-node
A Flight SQL client for Node.js
Language: Rust - Size: 1.11 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 2

alexkreidler/parquet2arrow
A fast and simple command-line (CLI) tool to convert a Parquet file to an Apache Arrow file
Language: Rust - Size: 11.7 KB - Last synced at: 20 days ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 1

pachadotdev/tradestatistics-plumber-api
tradestatistics.io API, reads from PostgreSQL and provides tidy CSV and Apache Arrow data
Language: R - Size: 166 KB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 2

firelink-data/evolution
🦖 Evolve your fixed-length data files into Apache Arrow tables, fully parallelized!
Language: Rust - Size: 242 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

marwan116/aws-parquet
a toolkit that provides an object-oriented interface for working with parquet datasets on AWS
Language: Python - Size: 43.9 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

tiwater/rerun-query
Query and extract entity data from Rerun data files.
Language: Rust - Size: 7.94 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 0

amoeba/arrow-opentelemetry-example
Example of using OpenTelemetry and Apache Arrow
Language: Python - Size: 115 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

tradestatistics/plumber-api Fork of pachadotdev/tradestatistics-plumber-api
tradestatistics.io API, reads from PostgreSQL and provides tidy CSV and Apache Arrow data
Language: R - Size: 166 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

droher/diachronic
Get daily historical snapshots of every article on any Wiki, formatted as Parquet files
Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

roeap/adx-arrow
Kusto client library optimized for data science workloads
Language: Rust - Size: 52.7 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

dantrim/parquet-writer
A C++ library for easily writing Parquet files containing columns of (mostly) any type you wish.
Language: C++ - Size: 1.03 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

joewood/react-iceberg
React Components to visualize Apache Iceberg tables
Language: TypeScript - Size: 1.35 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

matsadler/bishop
Query MongoDB via Apache Arrow and DataFusion
Language: Rust - Size: 37.1 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

apache/arrow-dotnet
Official .NET implementation of Apache Arrow
Size: 0 Bytes - Last synced at: 6 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

amoeba/arrow-cpp-conan-example
Example using conan to package and use libarrow
Language: CMake - Size: 7.81 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

tradestatistics/database-postgresql Fork of pachadotdev/tradestatistics-database-postgresql
Tidy trade data from UN COMTRADE and also countries, commodities, units, and reporting system tables. Writes to PostgreSQL.
Language: R - Size: 51.4 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

voutilad/redpanda-flight-rs
An Apache Arrow Flight proxy for Redpanda
Language: Rust - Size: 225 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

amoeba/arrow-flight-playground
Various examples related to Apache Arrow Flight.
Language: C++ - Size: 438 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

neo4j-field/dataflow-flex-pyarrow-to-gds
Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow
Language: Python - Size: 216 KB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 2

ZhengqiaoWang/ArrowDocsZhCN
Apache Arrow Chinese Document. Apache Arrow 中文文档手册
Language: C++ - Size: 130 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

antoniocachuan/gentle-introduction-apache-arrow
python
Language: Python - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 1
