An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: dataframe

pmgraham/datagrunt

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.

Language: Python - Size: 6.51 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 9 - Forks: 1

velox4j/velox4j

Java bindings for https://github.com/facebookincubator/velox

Language: Java - Size: 25.5 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 29 - Forks: 8

Quantco/dataframely

A declarative, 🐻‍❄️-native data frame validation library.

Language: Python - Size: 792 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 355 - Forks: 12

CangyuanLi/checkedframe

Lightweight, engine-agnostic dataframe validation

Language: Python - Size: 2.57 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 10 - Forks: 0

graphframes/graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

Language: Scala - Size: 3.83 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,065 - Forks: 250

NguyenDa18/Portland-Jail-Data-Crawler

Scraper used for recording changes to Portland jail database

Language: Jupyter Notebook - Size: 41.2 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 0

pola-rs/polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Language: Rust - Size: 191 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 34,357 - Forks: 2,297

man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

Language: C++ - Size: 180 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,975 - Forks: 143

comet-ml/kangas

🦘 Explore multimedia datasets at scale

Language: Jupyter Notebook - Size: 40.3 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 1,060 - Forks: 52

apache/datafusion

Apache DataFusion SQL Query Engine

Language: Rust - Size: 146 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7,431 - Forks: 1,538

databricks/koalas

Koalas: pandas API on Apache Spark

Language: Python - Size: 11.7 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 3,362 - Forks: 365

miozilla/pandas

pandas :panda_face::panda_face: : Python Library # Data Analysis # Dataframe

Language: Jupyter Notebook - Size: 146 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

shramos/Awesome-Cybersecurity-Datasets

A curated list of amazingly awesome Cybersecurity datasets

Size: 26.4 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 1,713 - Forks: 298

snowflakedb/snowpark-python

Snowflake Snowpark Python API

Language: Python - Size: 58.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 302 - Forks: 130

Conqxeror/veloxx

Veloxx: A high-performance, lightweight Rust library for in-memory data processing and analytics. Features DataFrames, Series, CSV/JSON I/O, powerful transformations, aggregations, and statistical functions for efficient data science and engineering.

Language: Rust - Size: 555 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

manzt/quak

a scalable data profiler

Language: TypeScript - Size: 2.48 MB - Last synced at: about 12 hours ago - Pushed at: 26 days ago - Stars: 367 - Forks: 15

hablapps/doric

Type safety for spark columns

Language: Scala - Size: 13.6 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 78 - Forks: 11

flow-php/flow

The most advanced data processing framework allowing to build scalable data processing pipelines and move data between various data sources and destinations.

Language: PHP - Size: 46 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 687 - Forks: 44

scipp/scipp

Multi-dimensional data arrays with labeled dimensions

Language: C++ - Size: 29.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 126 - Forks: 21

rapidsai/cudf

cuDF - GPU DataFrame Library

Language: C++ - Size: 159 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9,029 - Forks: 956

esadek/polars-prompt

Command line interface for the Polars Python API

Language: Python - Size: 188 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Kotlin/dataframe

Structured data processing in Kotlin

Language: Kotlin - Size: 145 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 943 - Forks: 73

pdpipe/pdpipe

Easy pipelines for pandas DataFrames.

Language: Jupyter Notebook - Size: 2.78 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 720 - Forks: 45

hosseinmoein/DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage

Language: C++ - Size: 47.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,738 - Forks: 336

freqtrade/technical

Various indicators developed or collected for the Freqtrade

Language: Python - Size: 7.53 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 887 - Forks: 233

apache/datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

Language: Rust - Size: 20.6 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 1,788 - Forks: 227

datisthq/dpkit

dpkit is a fast TypeScript data management framework built on top of the Data Package standard and Polars DataFrames

Language: TypeScript - Size: 1.13 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5 - Forks: 0

adamerose/PandasGUI

A GUI for Pandas DataFrames

Language: Python - Size: 8.67 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 3,232 - Forks: 240

mrpowers-io/spark-daria

Essential Spark extensions and helper methods ✨😲

Language: Scala - Size: 3.03 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 761 - Forks: 153

aryadhruv/LLMWorkbook

LLMWorkbook is a Python package that integrates Large Language Models (LLMs) with tabular datatypes - workbooks and dataframes for seamless data analysis and automation.

Language: Python - Size: 187 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5 - Forks: 2

zeknown/Pandas_in_Python-Retail_Supermarket

Data Wrangling through Python library such as Pandas. Data namely retail_supermarket extracted from Kaggle.com 🚀

Language: Jupyter Notebook - Size: 184 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

datavil/framex

A light-weight, dataset obtaining library for fast prototyping, tutorial creation, and experimenting.

Language: Python - Size: 2.81 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

jtablesaw/tablesaw

Java dataframe and visualization library

Language: Java - Size: 63.2 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 3,652 - Forks: 650

RaJharit77/Weather-Project

Repository for exam on the openweathermap api's project

Language: Jupyter Notebook - Size: 6.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

dflib/dflib

In-memory Java DataFrame library

Language: Java - Size: 5.74 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 268 - Forks: 25

sfu-db/connector-x

Fastest library to load data from DB to DataFrames in Rust and Python

Language: Rust - Size: 236 MB - Last synced at: 6 days ago - Pushed at: 27 days ago - Stars: 2,347 - Forks: 182

javascriptdata/danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

Language: TypeScript - Size: 79.1 MB - Last synced at: 7 days ago - Pushed at: 19 days ago - Stars: 4,953 - Forks: 217

approximatelabs/sketch

AI code-writing assistant that understands data content

Language: Python - Size: 8.98 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 2,275 - Forks: 119

lk-geimfari/mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

Language: Python - Size: 33.8 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 4,594 - Forks: 340

alteryx/woodwork

Woodwork is a Python library that provides robust methods for managing and communicating data typing information.

Language: Python - Size: 3.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 154 - Forks: 22

Samba250/Mars

Explore Mars, the fourth planet from the Sun, known for its reddish surface and intriguing geological features. 🚀 Join the mission to uncover its secrets and pave the way for future human exploration! 🌌

Size: 19.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

apache/hamilton

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Language: Jupyter Notebook - Size: 94.8 MB - Last synced at: 8 days ago - Pushed at: 19 days ago - Stars: 2,176 - Forks: 149

pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Language: Python - Size: 11.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,428 - Forks: 173

vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Language: Python - Size: 133 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 8,403 - Forks: 600

scicloj/tablecloth

Dataset manipulation library built on the top of tech.ml.dataset

Language: Clojure - Size: 28.1 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 335 - Forks: 28

skrub-data/skrub

Machine learning with dataframes

Language: Python - Size: 12.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,422 - Forks: 149

antl3x/codeplot

▱ Codeplot is your infinity canvas for data exploration.

Language: TypeScript - Size: 13 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 28 - Forks: 6

modin-project/modin

Modin: Scale your Pandas workflows by changing a single line of code

Language: Python - Size: 51.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 10,206 - Forks: 664

oreilles/polars-st

Spatial extension for Polars DataFrames.

Language: Python - Size: 1.9 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 101 - Forks: 5

Peter-Opapa/pandas-data-manipulation

This project was created as part of my journey to master data engineering foundations, particularly focusing on data manipulation using pandas. It demonstrates my understanding of pandas syntax and real-world data transformation tasks that are crucial before building pipelines.

Language: Jupyter Notebook - Size: 511 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

tidyverse/duckplyr

A drop-in replacement for dplyr, powered by DuckDB for speed.

Language: R - Size: 15.6 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 333 - Forks: 20

DeepSpace2/StyleFrame

A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel

Language: Python - Size: 571 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 379 - Forks: 54

uwdata/arquero

Query processing and transformation of array-backed data tables.

Language: JavaScript - Size: 1.37 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 1,415 - Forks: 68

caerbannogwhite/aargh

A library that helps you out of data nightmares in Go. 🧙‍♂️

Language: Go - Size: 33.7 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 5 - Forks: 0

mabel-dev/orso

Orso is a row-based Python DataFrame library

Language: Python - Size: 1.44 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 2

ranaroussi/pystore

Fast data store for Pandas time-series data

Language: Python - Size: 155 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 579 - Forks: 102

mrjsj/msfabricutils

Spark-free Python utilities for Microsoft Fabric focused on Data Engineering using Polars and delta-rs

Language: Python - Size: 1.39 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 25 - Forks: 5

hmz-23/Movie-Recommender-System

Language: Jupyter Notebook - Size: 3.87 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

elastic/eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Language: Python - Size: 20.9 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 680 - Forks: 111

rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Language: Go - Size: 1010 KB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 1,255 - Forks: 99

bessarodrigo/dataviz_dashboard_revenue

Dashboard com Streamlit que calcula a variação mensal de faturamento de uma empresa de Telemedicina.

Language: Jupyter Notebook - Size: 163 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

haifengl/smile

Statistical Machine Intelligence & Learning Engine

Language: Java - Size: 246 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 6,205 - Forks: 1,143

Axect/Peroxide

Rust numeric library with high performance and friendly syntax

Language: Rust - Size: 12.6 MB - Last synced at: 9 days ago - Pushed at: 20 days ago - Stars: 639 - Forks: 32

iakov-kaiumov/gsheet-pandas

Bridge between pandas and Google Sheets

Language: Python - Size: 55.7 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 1

SwellDB/SwellDB

The data system that answers anything.

Language: Python - Size: 2.25 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 0

areshytko/typedframe

Typed wrappers over pandas DataFrames with schema validation

Language: Python - Size: 318 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 101 - Forks: 8

evetion/GeoDataFrames.jl

Simple geographical vector interaction built on top of ArchGDAL

Language: Julia - Size: 2.69 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 68 - Forks: 8

Mo7amed3bdelghany/Introduction-to-Pandas-Leetcode-

My Pandas practice solutions from LeetCode's official beginner study plan

Language: Jupyter Notebook - Size: 117 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

intake/akimbo

For when your data won't fit in your dataframe

Language: Python - Size: 419 KB - Last synced at: 8 days ago - Pushed at: 18 days ago - Stars: 47 - Forks: 6

rendner/py-plugin-dataframe-viewer

Plugin for JetBrains IDEs to view Python DataFrames when debugging.

Language: Python - Size: 5.27 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 15 - Forks: 1

Alex0x4b/akutils

High-level Python library for recurring data manipulation (Pandas, Python data structure, API, file manipulation, etc.).

Language: Python - Size: 69.3 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

chitralverma/scala-polars

Polars for Scala & Java projects!

Language: Scala - Size: 4.09 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 84 - Forks: 7

ThoughtWorksInc/daffy

Function decorators for Pandas Dataframe column name and data type validation

Language: Python - Size: 136 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 17 - Forks: 6

ivanildobarauna-dev/api-to-dataframe

Lightweight Python library that transforms REST API responses into well-structured Pandas DataFrames — with built-in retry logic, schema validation, and intelligent type inference.

Language: Python - Size: 669 KB - Last synced at: 9 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

michaelchu/optopsy

A nimble options backtesting library for Python

Language: Python - Size: 8.87 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 1,130 - Forks: 178

janssenhenning/aiida-dataframe

AiiDA data plugin for pandas DataFrame objects

Language: Python - Size: 142 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 5 - Forks: 1

dmnfarrell/pandastable

Table analysis in Tkinter using pandas DataFrames.

Language: Python - Size: 8.99 MB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 651 - Forks: 125

techascent/tech.ml.dataset

A Clojure high performance data processing system

Language: Clojure - Size: 9.59 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 706 - Forks: 34

cognitum-octopus/cognipy

In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas

Language: C# - Size: 133 MB - Last synced at: 17 days ago - Pushed at: 25 days ago - Stars: 54 - Forks: 10

abdenlab/oxbow

Oxbow makes genomic data ready for high-performance analytics.

Language: Rust - Size: 16.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 81 - Forks: 9

heronshoes/wisconsin-benchmark

Wisconsin Benchmark dataset generator

Language: Ruby - Size: 729 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

adi-g15/kharcha

Tool to automate expense summary from SBI, HDFC, Credit Cards, Amazon Pay statements.

Language: Python - Size: 152 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 0

EdAbati/dataframes-haystack

Haystack custom components for your favourite dataframe library.

Language: Jupyter Notebook - Size: 258 KB - Last synced at: about 7 hours ago - Pushed at: 6 days ago - Stars: 3 - Forks: 0

kszucs/pandahouse

Pandas interface for Clickhouse database

Language: Python - Size: 61.5 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 238 - Forks: 69

microsoft/Mobius

C# and F# language binding and extensions to Apache Spark

Language: C# - Size: 6.44 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 940 - Forks: 211

alexhallam/tv

📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

Language: Rust - Size: 33.2 MB - Last synced at: 23 days ago - Pushed at: 6 months ago - Stars: 2,097 - Forks: 40

MrDataPsycho/data-pipelines-in-rust

Data pipeline example written in Rust with Polars and DataFusion DataFrame package

Language: Rust - Size: 142 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 1

tidypyverse/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

Language: Python - Size: 5.75 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 101 - Forks: 8

Kanaries/pygwalker

PyGWalker: Turn your dataframe into an interactive UI for visual analysis

Language: Python - Size: 62.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14,916 - Forks: 796

atsyplenkov/pastum

VS Code extension to transform table from clipboard to R, Python or Julia dataframe

Language: JavaScript - Size: 50.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 42 - Forks: 0

bertrandmartel/tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz

Language: Python - Size: 485 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 135 - Forks: 22

fsanaulla/chronicler-spark

InfluxDB connector to Apache Spark on top of Chronicler

Language: Scala - Size: 243 KB - Last synced at: about 14 hours ago - Pushed at: 12 months ago - Stars: 28 - Forks: 4

fphammerle/freesurfer-stats 📦

Python Library to Read FreeSurfer's Cortical Parcellation Anatomical Statistics

Language: Python - Size: 469 KB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 1

CybercentreCanada/jupyterlab-sql-editor

A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino

Language: Jupyter Notebook - Size: 90.6 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 88 - Forks: 14

mathijs81/java-dataframes

A quick test of a couple of data frame libraries for Java

Language: Jupyter Notebook - Size: 424 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 21 - Forks: 11

maxwellt23/SwiftFrames

A Swift-native DataFrame library inspired by pandas — load, view, transform, and export tabular data with ease.

Language: Swift - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

pascalr0410/mySQLTableHelper

Simple module to load a Julia DataFrame into a MySql DB

Language: Julia - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Ashbyt/Python

Ashley Bythell - Python

Language: Jupyter Notebook - Size: 5.62 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

coding-kitties/PyIndicators

PyIndicators is a powerful and user-friendly Python library for technical analysis indicators, metrics and helper functions. Written entirely in Python, it requires no external dependencies, ensuring seamless integration and ease of use.

Language: Python - Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 1

Zybulon/h5pandas

Dataframes from HDF5 instantaneously

Language: Python - Size: 612 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0