An open API service providing repository metadata for many open source software ecosystems.

Topic: "apache-arrow"

pixie-io/pixie

Instant Kubernetes-Native Application Observability

Language: C++ - Size: 114 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 5,965 - Forks: 463

lancedb/lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Language: Rust - Size: 21.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4,515 - Forks: 285

aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Language: Python - Size: 17.1 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 4,005 - Forks: 705

polarsignals/frostdb

❄️ Coolest database around 🧊 Embeddable column database written in Go.

Language: Go - Size: 14.2 MB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 1,407 - Forks: 67

scikit-hep/awkward

Manipulate JSON-like data with NumPy-like idioms.

Language: Python - Size: 26.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 874 - Forks: 90

visgl/loaders.gl

Loaders for big data visualization. Website:

Language: TypeScript - Size: 293 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 745 - Forks: 203

developmentseed/lonboard

A Python library for fast, interactive geospatial vector data visualization in Jupyter.

Language: Python - Size: 133 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 736 - Forks: 39

geopolars/geopolars

Geospatial extensions for Polars

Language: Rust - Size: 5.93 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 695 - Forks: 24

unum-cloud/ustore

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

Language: C++ - Size: 6.56 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 587 - Forks: 34

kylebarron/parquet-wasm

Rust-based WebAssembly bindings to read and write Apache Parquet data

Language: Rust - Size: 2.5 MB - Last synced at: 10 days ago - Pushed at: 14 days ago - Stars: 569 - Forks: 20

geoarrow/geoarrow

Specification for storing geospatial data in Apache Arrow

Size: 63.5 KB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 456 - Forks: 25

1duo/awesome-ai-infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Size: 11.8 MB - Last synced at: 12 days ago - Pushed at: almost 6 years ago - Stars: 411 - Forks: 73

geoarrow/geoarrow-rs

GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations

Language: Rust - Size: 13.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 317 - Forks: 26

apache/arrow-julia

Official Julia implementation of Apache Arrow

Language: Julia - Size: 2.04 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 289 - Forks: 64

nevi-me/rust-dataframe 📦

A Rust DataFrame implementation, built on Apache Arrow

Language: Rust - Size: 253 KB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 281 - Forks: 20

cldellow/sqlite-parquet-vtable

A SQLite vtable extension to read Parquet files

Language: C++ - Size: 404 KB - Last synced at: 6 days ago - Pushed at: almost 4 years ago - Stars: 271 - Forks: 31

abs-tudelft/fletcher

Fletcher: A framework to integrate FPGA accelerators with Apache Arrow

Language: VHDL - Size: 8.04 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 225 - Forks: 31

scikit-hep/awkward-0.x 📦

Manipulate arrays of complex data structures as easily as Numpy.

Language: Python - Size: 6.42 MB - Last synced at: 20 days ago - Pushed at: about 4 years ago - Stars: 214 - Forks: 39

G-Research/ParquetSharp

ParquetSharp is a .NET library for reading and writing Apache Parquet files.

Language: C# - Size: 1.73 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 199 - Forks: 52

google/space

Unified storage framework for the entire machine learning lifecycle

Language: Python - Size: 825 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 156 - Forks: 8

apache/arrow-go

Official Go implementation of Apache Arrow

Language: Assembly - Size: 19.1 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 154 - Forks: 27

nanoporetech/pod5-file-format

Pod5: a high performance file format for nanopore reads.

Language: C++ - Size: 28.7 MB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 147 - Forks: 20

mattf96s/QuackDB

Open-source in-browser DuckDB SQL editor

Language: TypeScript - Size: 3.6 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 145 - Forks: 7

kylebarron/arro3

A minimal Python library for Apache Arrow, connecting to the Rust arrow crate

Language: Rust - Size: 3.3 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 140 - Forks: 11

geoarrow/deck.gl-layers

deck.gl layers for rendering GeoArrow data

Language: TypeScript - Size: 2.59 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 117 - Forks: 8

kylebarron/arrow-js-ffi

Zero-copy reading of Arrow data from WebAssembly

Language: TypeScript - Size: 360 KB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 115 - Forks: 9

mongodb-labs/mongo-arrow

MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.

Language: Python - Size: 505 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 102 - Forks: 16

cmudig/falcon-vis Fork of vega/falcon

Cross-filter millions (or even billions) of data entries with no interaction delay

Language: Jupyter Notebook - Size: 131 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 99 - Forks: 2

igor-suhorukov/openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

Language: Java - Size: 6.06 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 92 - Forks: 8

man-group/sparrow

C++20 idiomatic APIs for the Apache Arrow Columnar Format

Language: C++ - Size: 1.46 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 85 - Forks: 18

duo-rs/duo

A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.

Language: Rust - Size: 2.51 MB - Last synced at: 26 days ago - Pushed at: 7 months ago - Stars: 73 - Forks: 7

abdenlab/oxbow

Read specialized NGS formats as data frames in R, Python, and more.

Language: Rust - Size: 15.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 70 - Forks: 8

red-data-tools/red_amber

A dataframe library for Rubyists.

Language: Ruby - Size: 5.25 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 70 - Forks: 14

cldellow/csv2parquet

Convert a CSV to a parquet file.

Language: Python - Size: 97.7 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 64 - Forks: 14

elixir-explorer/adbc

Apache Arrow ADBC bindings for Elixir

Language: C++ - Size: 4.92 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 63 - Forks: 17

UWHustle/hustle

In-memory, columnar, arrow-based database.

Language: C++ - Size: 13.8 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 46 - Forks: 7

baggiponte/awesome-pandas-alternatives

Awesome list of alternative dataframe libraries in Python.

Size: 21.5 KB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 44 - Forks: 3

apache/arrow-java

Official Java implementation of Apache Arrow

Language: Java - Size: 23.8 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 40 - Forks: 41

influxdata/flightsql-dbapi

DB API 2 interface for Flight SQL with SQLAlchemy extras.

Language: Python - Size: 188 KB - Last synced at: 1 day ago - Pushed at: 27 days ago - Stars: 38 - Forks: 5

neo4j-product-examples/ds-graphconnect-2022-demo

Language: Jupyter Notebook - Size: 337 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 38 - Forks: 4

tradewelltech/beavers

Python stream processing for analytics

Language: Python - Size: 591 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 37 - Forks: 2

animeshtrivedi/ArrowExample

Java read and write example for Apache Arrow

Language: Java - Size: 56.6 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 33 - Forks: 11

geoarrow/geoarrow-js

TypeScript implementation of GeoArrow

Language: TypeScript - Size: 308 KB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 28 - Forks: 6

tradewelltech/protarrow

Convert from protobuf to arrow and back

Language: Python - Size: 9.06 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 25 - Forks: 3

kylebarron/arrow-wasm

Building block library for using Apache Arrow in Rust WebAssembly modules.

Language: Rust - Size: 272 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 22 - Forks: 5

JosiahParry/arrow-extendr

Integration between arrow-rs and extendr

Language: Rust - Size: 66.4 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 22 - Forks: 2

webysther/aws-glue-docker 📦

🐋 Docker image for AWS Glue Spark/Python

Language: Dockerfile - Size: 56.6 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 22 - Forks: 8

spirom/arrow-simpledb

Query processing for an extremely simple, in-memory, columnar database using Apache Arrow to represent tables

Language: C++ - Size: 190 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 22 - Forks: 5

grouzen/zio-apache-arrow

Scala ZIO-powered Apache Arrow library

Language: Scala - Size: 482 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 21 - Forks: 1

sonhmai/how-sqlite-works

A Book about how SQLite works. Rewriting SQLite in Rust for Learning and Fun and writing a book I wished I had when started.

Language: Rust - Size: 16 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 21 - Forks: 1

mbrobbel/narrow

An experimental (work-in-progress) statically typed implementation of Apache Arrow

Language: Rust - Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 19 - Forks: 5

kszucs/firebolt

Arrow implementation in Mojo

Language: Mojo - Size: 21.5 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 19 - Forks: 1

qwshen/spark-flight-connector

A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL

Language: Java - Size: 163 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 19 - Forks: 3

rpy2/rpy2-arrow

Share Apache Arrow datasets between Python and R.

Language: Python - Size: 664 KB - Last synced at: about 8 hours ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 3

graphext/lector

A fast reader for messy CSV files with optional type inference.

Language: Python - Size: 239 KB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 0

madesroches/micromegas

Scalable Observability

Language: Rust - Size: 2.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 16 - Forks: 4

amoeba/QLArrow

WIP QuickLook plugin for Apache Arrow and Parquet

Language: C - Size: 23 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 16 - Forks: 1

renesugar/FileConvert

Converts between file formats such as CSV and Parquet

Language: C - Size: 3.65 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 14 - Forks: 1

ModelarData/ModelarDB-RS

ModelarDB: Model-Based Time Series Management from Edge to Client

Language: Rust - Size: 1.56 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 13 - Forks: 5

datafusion-contrib/datafusion-c

C language bindings for DataFusion

Language: C - Size: 5.75 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 3

kat-co/cl-apache-arrow

This is a library for working with Apache Arrow and Parquet data.

Language: Common Lisp - Size: 51.8 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 2

Benjamin-Philip/serde_arrow

Serialization and deserialization to Apache Arrow for Erlang

Language: Erlang - Size: 158 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 11 - Forks: 1

Desdaemon/polars_dart

Dart bindings for the polars library

Language: Dart - Size: 968 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 1

cldellow/parquet-metadata

Dump metadata about a Parquet file.

Language: Python - Size: 39.1 KB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 3

arkady-emelyanov/pyarrow-flight

Apache Arrow Flight example

Language: Python - Size: 1000 Bytes - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 3

cpg314/polarhouse

Interoperability between Polars and Clickhouse

Language: Rust - Size: 87.9 KB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

unum-cloud/udsb

Unlimited Data-Science Benchmarks for Numeric, Tabular and Graph Workloads

Language: Jupyter Notebook - Size: 3.57 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

ljishen/bitar

Simplify accessing hardware compression/decompression accelerators

Language: C++ - Size: 541 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 2

perspective-community/arrow-wasm-cpp 📦

Standalone Apache Arrow compiled to WebAssembly, extracted from https://github.com/finos/perspective

Language: CMake - Size: 88.9 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

amoeba/arrow-python-js-ipc-example

Example showing how to send Arrow RecordBatches from a Python backend to a web browser.

Language: JavaScript - Size: 26.4 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

poopoothegorilla/fastframe

DataFrame project that utilizes Apache Arrow

Language: Go - Size: 218 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 7 - Forks: 0

rupurt/zodbc

A blazing fast ODBC Zig client

Language: Zig - Size: 125 KB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 2

Sebastiaan-Alvarez-Rodriguez/arrow-spark-publication

Implementation connecting Arrow to Spark, effectively making all code related to reading in Spark redundant.

Language: C++ - Size: 9.12 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 6 - Forks: 4

spaghettifunk/norman

Realtime distributed OLAP datastore, designed to answer OLAP queries with low latency written in Go. In Active development

Language: Go - Size: 370 KB - Last synced at: 9 days ago - Pushed at: 26 days ago - Stars: 5 - Forks: 0

lykmapipo/Python-Spark-Log-Analysis

Python scripts to process, and analyze log files using PySpark.

Language: Python - Size: 131 KB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

animeshtrivedi/benchmarking-arrow

Benchmarking Arrow/Java

Language: Java - Size: 226 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

mluttikh/xml2arrow

Efficiently convert XML data to Apache Arrow format for high-performance data processing

Language: Rust - Size: 224 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

cupiddb/cupiddb

In-memory Columnar Database

Language: Rust - Size: 49.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

lykmapipo/NYC-TLC-Trip-Data

Python scripts to download, process, and analyze the New York City Taxi and Limousine Commission (TLC) Trip Record Data dataset

Language: Jupyter Notebook - Size: 100 MB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 4 - Forks: 1

roeap/flight-sql-client-node

A Flight SQL client for Node.js

Language: Rust - Size: 1.11 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 2

alexkreidler/parquet2arrow

A fast and simple command-line (CLI) tool to convert a Parquet file to an Apache Arrow file

Language: Rust - Size: 11.7 KB - Last synced at: 20 days ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 1

pachadotdev/tradestatistics-plumber-api

tradestatistics.io API, reads from PostgreSQL and provides tidy CSV and Apache Arrow data

Language: R - Size: 166 KB - Last synced at: 21 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 2

firelink-data/evolution

🦖 Evolve your fixed-length data files into Apache Arrow tables, fully parallelized!

Language: Rust - Size: 242 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

marwan116/aws-parquet

a toolkit that provides an object-oriented interface for working with parquet datasets on AWS

Language: Python - Size: 43.9 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

tiwater/rerun-query

Query and extract entity data from Rerun data files.

Language: Rust - Size: 7.94 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 0

amoeba/arrow-opentelemetry-example

Example of using OpenTelemetry and Apache Arrow

Language: Python - Size: 115 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

tradestatistics/plumber-api Fork of pachadotdev/tradestatistics-plumber-api

tradestatistics.io API, reads from PostgreSQL and provides tidy CSV and Apache Arrow data

Language: R - Size: 166 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

droher/diachronic

Get daily historical snapshots of every article on any Wiki, formatted as Parquet files

Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

roeap/adx-arrow

Kusto client library optimized for data science workloads

Language: Rust - Size: 52.7 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

dantrim/parquet-writer

A C++ library for easily writing Parquet files containing columns of (mostly) any type you wish.

Language: C++ - Size: 1.03 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

joewood/react-iceberg

React Components to visualize Apache Iceberg tables

Language: TypeScript - Size: 1.35 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

matsadler/bishop

Query MongoDB via Apache Arrow and DataFusion

Language: Rust - Size: 37.1 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

apache/arrow-dotnet

Official .NET implementation of Apache Arrow

Size: 0 Bytes - Last synced at: 6 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

amoeba/arrow-cpp-conan-example

Example using conan to package and use libarrow

Language: CMake - Size: 7.81 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

tradestatistics/database-postgresql Fork of pachadotdev/tradestatistics-database-postgresql

Tidy trade data from UN COMTRADE and also countries, commodities, units, and reporting system tables. Writes to PostgreSQL.

Language: R - Size: 51.4 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

voutilad/redpanda-flight-rs

An Apache Arrow Flight proxy for Redpanda

Language: Rust - Size: 225 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

amoeba/arrow-flight-playground

Various examples related to Apache Arrow Flight.

Language: C++ - Size: 438 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

neo4j-field/dataflow-flex-pyarrow-to-gds

Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow

Language: Python - Size: 216 KB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 2

ZhengqiaoWang/ArrowDocsZhCN

Apache Arrow Chinese Document. Apache Arrow 中文文档手册

Language: C++ - Size: 130 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

antoniocachuan/gentle-introduction-apache-arrow

python

Language: Python - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 1