An open API service providing repository metadata for many open source software ecosystems.

Topic: "data"

TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

Language: TypeScript - Size: 92.3 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 47,881 - Forks: 3,620

run-llama/llama_index

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Language: Python - Size: 362 MB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 45,922 - Forks: 6,651

metabase/metabase

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

Language: Clojure - Size: 1.38 GB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 44,855 - Forks: 6,065

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Language: Jupyter Notebook - Size: 59.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 38,528 - Forks: 7,412

SheetJS/sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

Size: 101 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 36,087 - Forks: 7,990

vercel/swr

React Hooks for Data Fetching

Language: TypeScript - Size: 7.95 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 32,211 - Forks: 1,304

sinaptik-ai/pandas-ai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

Language: Python - Size: 54.8 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 22,808 - Forks: 2,234

PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Language: Python - Size: 187 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 21,142 - Forks: 2,039

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Language: Python - Size: 756 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 20,303 - Forks: 4,973

fivethirtyeight/data

Data and code behind the articles and graphics at FiveThirtyEight

Language: Jupyter Notebook - Size: 155 MB - Last synced at: 7 months ago - Pushed at: 10 months ago - Stars: 17,060 - Forks: 11,127

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

Language: Java - Size: 250 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 16,594 - Forks: 5,509

akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

Language: Python - Size: 4.85 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 14,899 - Forks: 2,659

faker-js/faker

Generate massive amounts of fake data in the browser and node.js

Language: TypeScript - Size: 30.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 14,747 - Forks: 1,037

oxnr/awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Size: 845 KB - Last synced at: 5 days ago - Pushed at: 28 days ago - Stars: 14,102 - Forks: 2,589

pwxcoo/chinese-xinhua

:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

Language: Python - Size: 34.6 MB - Last synced at: 7 months ago - Pushed at: almost 2 years ago - Stars: 11,204 - Forks: 2,621

apple/pkl

A configuration as code language with rich validation and tooling.

Language: Java - Size: 7.12 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 10,981 - Forks: 348

PRQL/prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

Language: Rust - Size: 22.9 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 10,572 - Forks: 247

bchavez/Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

Language: C# - Size: 6.12 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 9,540 - Forks: 536

rawgraphs/rawgraphs-app

A web interface to create custom vector-based visualizations on top of RAWGraphs core

Language: JavaScript - Size: 51 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8,893 - Forks: 1,857

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Language: Python - Size: 233 MB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 8,583 - Forks: 893

D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Language: Python - Size: 4.02 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 8,318 - Forks: 475

mrdbourke/machine-learning-roadmap

A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.

Size: 24.8 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 7,740 - Forks: 1,168

olifolkerd/tabulator

Interactive Tables and Data Grids for JavaScript

Language: JavaScript - Size: 86 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 7,403 - Forks: 871

snowplow/snowplow

The leader in Next-Generation Customer Data Infrastructure

Language: Scala - Size: 25.5 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 6,924 - Forks: 1,189

flyteorg/flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

Language: Go - Size: 331 MB - Last synced at: about 22 hours ago - Pushed at: 1 day ago - Stars: 6,643 - Forks: 767

cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.

Language: Go - Size: 179 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 6,277 - Forks: 544

dformoso/machine-learning-mindmap

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Size: 14.8 MB - Last synced at: 7 months ago - Pushed at: over 5 years ago - Stars: 6,193 - Forks: 1,007

axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

Language: JavaScript - Size: 52.6 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 6,001 - Forks: 318

cue-lang/cue

The home of the CUE language! Validate and define text-based and dynamic configuration

Language: Go - Size: 56.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5,848 - Forks: 346

Countly/countly-server

Countly is a product analytics platform that helps teams track, analyze and act-on their user actions and behaviour on mobile, web and desktop applications.

Language: JavaScript - Size: 665 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 5,804 - Forks: 979

datajuicer/data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Language: Python - Size: 723 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 5,643 - Forks: 304

airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Language: Python - Size: 74 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 5,532 - Forks: 685

mdn/browser-compat-data

Browser compatibility data for Web technologies as displayed on MDN

Language: JSON - Size: 113 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 5,522 - Forks: 2,435

brianvoe/gofakeit

Random fake data generator written in go

Language: Go - Size: 7.8 MB - Last synced at: 6 days ago - Pushed at: 21 days ago - Stars: 5,264 - Forks: 293

superduper-io/superduper

Superduper: End-to-end framework for building custom AI applications and agents.

Language: Python - Size: 73.8 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 5,233 - Forks: 532

cocoindex-io/cocoindex

Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!

Language: Rust - Size: 97.4 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 4,960 - Forks: 371

ckan/ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

Language: Python - Size: 215 MB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 4,912 - Forks: 2,071

tinyplex/tinybase

A reactive data store & sync engine.

Language: TypeScript - Size: 363 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 4,833 - Forks: 117

glideapps/glide-data-grid

🚀 Glide Data Grid is a no compromise, outrageously fast react data grid with rich rendering, first class accessibility, and full TypeScript support.

Language: TypeScript - Size: 95.4 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 4,714 - Forks: 368

dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Language: Python - Size: 101 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 4,707 - Forks: 412

ArroyoSystems/arroyo

Distributed stream processing engine in Rust

Language: Rust - Size: 15.6 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 4,705 - Forks: 325

lk-geimfari/mimesis

Mimesis is a fast Python library for generating fake data in multiple languages.

Language: Python - Size: 33.9 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 4,655 - Forks: 346

tensorflow/datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

Language: Python - Size: 952 MB - Last synced at: 18 days ago - Pushed at: 20 days ago - Stars: 4,507 - Forks: 1,592

StructuredLabs/preswald

Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.

Language: Python - Size: 97.2 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 4,309 - Forks: 666

jonschlinkert/gray-matter

Smarter YAML front matter parser, used by metalsmith, Gatsby, Netlify, Assemble, mapbox-gl, phenomic, vuejs vitepress, TinaCMS, Shopify Polaris, Ant Design, Astro, hashicorp, garden, slidev, saber, sourcegraph, and many others. Simple to use, and battle tested. Parses YAML by default but can also parse JSON Front Matter, Coffee Front Matter, TOML Front Matter, and has support for custom parsers. Please follow gray-matter's author: https://github.com/jonschlinkert

Language: JavaScript - Size: 342 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 4,293 - Forks: 151

truefoundry/cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

Language: Python - Size: 50.3 MB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 4,289 - Forks: 361

speedyapply/2026-AI-College-Jobs

2026 AI/ML internship & new graduate job list updated daily

Size: 4.28 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 4,249 - Forks: 172

Quartz/bad-data-guide

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

Size: 125 KB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 4,071 - Forks: 403

quadratichq/quadratic

Spreadsheet with AI, Code, Connections

Language: TypeScript - Size: 1.33 GB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3,898 - Forks: 254

mlabonne/llm-datasets

Curated list of datasets and tools for post-training.

Size: 103 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3,850 - Forks: 318

LazyAGI/LazyLLM

Easiest and laziest way for building multi-agent LLMs applications.

Language: Python - Size: 14.6 MB - Last synced at: about 15 hours ago - Pushed at: about 21 hours ago - Stars: 3,637 - Forks: 352

Belval/TextRecognitionDataGenerator

A synthetic data generator for text recognition

Language: Python - Size: 149 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 3,612 - Forks: 1,017

dtinit/data-transfer-project

The Data Transfer Project makes it easy for platforms to build interoperable user data portability features. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.

Language: Java - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3,594 - Forks: 487

jdorfman/awesome-json-datasets

A curated list of awesome JSON datasets that don't require authentication.

Language: JavaScript - Size: 238 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 3,526 - Forks: 386

Docta-ai/docta

A Doctor for your data

Language: Python - Size: 27.8 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 3,478 - Forks: 258

heroku/react-refetch

A simple, declarative, and composable way to fetch data for React components

Language: JavaScript - Size: 1.04 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 3,422 - Forks: 141

superstreamlabs/memphis

Memphis.dev is a highly scalable and effortless data streaming platform

Language: Go - Size: 468 MB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 3,417 - Forks: 229

ngneat/falso

All the Fake Data for All Your Real Needs 🙂

Language: TypeScript - Size: 11.4 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 3,322 - Forks: 121

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

Language: Python - Size: 62.3 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 3,290 - Forks: 352

ruc-datalab/DeepAnalyze

DeepAnalyze is the first agentic LLM for autonomous data science. 🎈你的AI数据分析师,自动分析大量数据,一键生成专业分析报告!

Language: Python - Size: 22.5 MB - Last synced at: 7 days ago - Pushed at: 10 days ago - Stars: 3,191 - Forks: 472

pydata/pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

Language: Python - Size: 12.3 MB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 3,127 - Forks: 684

uber/aresdb

A GPU-powered real-time analytics storage and query engine.

Language: Go - Size: 12.4 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 3,065 - Forks: 235

Kanaries/graphic-walker

An open source alternative to Tableau. Embeddable visual analytic

Language: TypeScript - Size: 3.73 MB - Last synced at: 25 days ago - Pushed at: 28 days ago - Stars: 3,012 - Forks: 163

weld-project/weld

High-performance runtime for data analytics applications

Language: Rust - Size: 2.88 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 3,001 - Forks: 256

montanaflynn/stats

A well tested and comprehensive Golang statistics library package with no dependencies.

Language: Go - Size: 333 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 2,992 - Forks: 170

datafold/data-diff 📦

Compare tables within or across databases

Language: Python - Size: 3.98 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 2,987 - Forks: 295

apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

Language: Go - Size: 38.8 MB - Last synced at: 7 days ago - Pushed at: 10 days ago - Stars: 2,884 - Forks: 659

kayak/pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

Language: Python - Size: 1.27 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2,753 - Forks: 319

spiceai/spiceai

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

Language: Rust - Size: 66.9 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 2,645 - Forks: 150

spotify/scio

A Scala API for Apache Beam and Google Cloud Dataflow.

Language: Scala - Size: 89.5 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 2,614 - Forks: 526

mito-ds/mito

Jupyter extensions that help you write code faster: Context aware AI Chat, Autocomplete, and Spreadsheet

Language: Jupyter Notebook - Size: 278 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,601 - Forks: 205

unsplash/datasets

🎁 6,500,000+ Unsplash images made available for research and machine learning

Language: Jupyter Notebook - Size: 70.3 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 2,597 - Forks: 131

justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

Language: Python - Size: 689 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 2,561 - Forks: 387

EntilZha/PyFunctional

Python library for creating data pipelines with chain functional programming

Language: Python - Size: 893 KB - Last synced at: about 3 hours ago - Pushed at: 10 months ago - Stars: 2,487 - Forks: 133

colour-science/colour

Colour Science for Python

Language: Python - Size: 124 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 2,469 - Forks: 279

deepnote/deepnote

Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data integrations. Use Python, R, and SQL locally in your favorite IDE, then scale to Deepnote cloud for real-time collaboration, Deepnote agent, and deployable data apps. https://deepnote.com/

Language: TypeScript - Size: 20.7 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,453 - Forks: 157

rilldata/rill

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

Language: Go - Size: 570 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 2,426 - Forks: 159

any4ai/AnyCrawl

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

Language: TypeScript - Size: 1.68 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,423 - Forks: 247

github/CodeSearchNet 📦

Datasets, tools, and benchmarks for representation learning of code.

Language: Jupyter Notebook - Size: 28.6 MB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 2,378 - Forks: 408

lukes/ISO-3166-Countries-with-Regional-Codes

ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

Language: Ruby - Size: 188 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 2,375 - Forks: 3,322

Visualize-ML/Book6_First-Course-in-Data-Science

Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!

Language: Jupyter Notebook - Size: 169 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 2,347 - Forks: 432

emirozer/fake2db

create custom test databases that are populated with fake data

Language: Python - Size: 1020 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 2,341 - Forks: 124

malloydata/malloy

Malloy is a modern open source language for describing data relationships and transformations.

Language: TypeScript - Size: 339 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 2,315 - Forks: 113

meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Language: Python - Size: 145 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,293 - Forks: 191

approximatelabs/sketch

AI code-writing assistant that understands data content

Language: Python - Size: 8.98 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 2,287 - Forks: 119

benkeen/generatedata

A powerful, feature-rich, random test data generator.

Language: TypeScript - Size: 82.5 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 2,272 - Forks: 620

TigerResearch/TigerBot

TigerBot: A multi-language multi-task LLM

Language: Python - Size: 74.2 MB - Last synced at: 11 days ago - Pushed at: 12 months ago - Stars: 2,261 - Forks: 190

apache/gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Language: Java - Size: 128 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 2,257 - Forks: 749

MarcSkovMadsen/awesome-streamlit

The purpose of this project is to share knowledge on how awesome Streamlit is and can be

Language: HTML - Size: 115 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 2,232 - Forks: 367

DeepInsight-AI/DeepBI

LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.

Language: Python - Size: 134 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 2,227 - Forks: 357

GSA/data

Assorted data from the General Services Administration.

Language: HTML - Size: 10.9 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 2,168 - Forks: 276

pretzelai/pretzelai

The modern replacement for Jupyter Notebooks

Language: TypeScript - Size: 264 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 2,155 - Forks: 155

man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

Language: C++ - Size: 203 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,117 - Forks: 155

mara/mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

Language: Python - Size: 3.29 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 2,086 - Forks: 99

mahmoud/glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

Language: Python - Size: 1.27 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 2,080 - Forks: 68

onyx-platform/onyx 📦

Distributed, masterless, high performance, fault tolerant data processing

Language: Clojure - Size: 16.2 MB - Last synced at: 5 days ago - Pushed at: over 6 years ago - Stars: 2,044 - Forks: 202

keajs/kea

Batteries Included State Management for React

Language: JavaScript - Size: 7.33 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 1,986 - Forks: 51

illacceptanything/illacceptanything

The project where literally anything* goes.

Language: Ruby - Size: 1.47 GB - Last synced at: 23 days ago - Pushed at: 25 days ago - Stars: 1,961 - Forks: 591

brimdata/zui

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

Language: TypeScript - Size: 222 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 1,912 - Forks: 136

baidu/tera

An Internet-Scale Database.

Language: C++ - Size: 15.7 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 1,904 - Forks: 436