GitHub topics: dataframe
pmgraham/datagrunt
Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.
Language: Python - Size: 6.51 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 9 - Forks: 1

velox4j/velox4j
Java bindings for https://github.com/facebookincubator/velox
Language: Java - Size: 25.5 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 29 - Forks: 8

Quantco/dataframely
A declarative, 🐻❄️-native data frame validation library.
Language: Python - Size: 792 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 355 - Forks: 12

CangyuanLi/checkedframe
Lightweight, engine-agnostic dataframe validation
Language: Python - Size: 2.57 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 10 - Forks: 0

graphframes/graphframes
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
Language: Scala - Size: 3.83 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,065 - Forks: 250

NguyenDa18/Portland-Jail-Data-Crawler
Scraper used for recording changes to Portland jail database
Language: Jupyter Notebook - Size: 41.2 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 0

pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Language: Rust - Size: 191 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 34,357 - Forks: 2,297

man-group/ArcticDB
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Language: C++ - Size: 180 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,975 - Forks: 143

comet-ml/kangas
🦘 Explore multimedia datasets at scale
Language: Jupyter Notebook - Size: 40.3 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 1,060 - Forks: 52

apache/datafusion
Apache DataFusion SQL Query Engine
Language: Rust - Size: 146 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7,431 - Forks: 1,538

databricks/koalas
Koalas: pandas API on Apache Spark
Language: Python - Size: 11.7 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 3,362 - Forks: 365

miozilla/pandas
pandas :panda_face::panda_face: : Python Library # Data Analysis # Dataframe
Language: Jupyter Notebook - Size: 146 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

shramos/Awesome-Cybersecurity-Datasets
A curated list of amazingly awesome Cybersecurity datasets
Size: 26.4 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 1,713 - Forks: 298

snowflakedb/snowpark-python
Snowflake Snowpark Python API
Language: Python - Size: 58.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 302 - Forks: 130

Conqxeror/veloxx
Veloxx: A high-performance, lightweight Rust library for in-memory data processing and analytics. Features DataFrames, Series, CSV/JSON I/O, powerful transformations, aggregations, and statistical functions for efficient data science and engineering.
Language: Rust - Size: 555 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

manzt/quak
a scalable data profiler
Language: TypeScript - Size: 2.48 MB - Last synced at: about 12 hours ago - Pushed at: 26 days ago - Stars: 367 - Forks: 15

hablapps/doric
Type safety for spark columns
Language: Scala - Size: 13.6 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 78 - Forks: 11

flow-php/flow
The most advanced data processing framework allowing to build scalable data processing pipelines and move data between various data sources and destinations.
Language: PHP - Size: 46 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 687 - Forks: 44

scipp/scipp
Multi-dimensional data arrays with labeled dimensions
Language: C++ - Size: 29.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 126 - Forks: 21

rapidsai/cudf
cuDF - GPU DataFrame Library
Language: C++ - Size: 159 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9,029 - Forks: 956

esadek/polars-prompt
Command line interface for the Polars Python API
Language: Python - Size: 188 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Kotlin/dataframe
Structured data processing in Kotlin
Language: Kotlin - Size: 145 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 943 - Forks: 73

pdpipe/pdpipe
Easy pipelines for pandas DataFrames.
Language: Jupyter Notebook - Size: 2.78 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 720 - Forks: 45

hosseinmoein/DataFrame
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
Language: C++ - Size: 47.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,738 - Forks: 336

freqtrade/technical
Various indicators developed or collected for the Freqtrade
Language: Python - Size: 7.53 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 887 - Forks: 233

apache/datafusion-ballista
Apache DataFusion Ballista Distributed Query Engine
Language: Rust - Size: 20.6 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 1,788 - Forks: 227

datisthq/dpkit
dpkit is a fast TypeScript data management framework built on top of the Data Package standard and Polars DataFrames
Language: TypeScript - Size: 1.13 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5 - Forks: 0

adamerose/PandasGUI
A GUI for Pandas DataFrames
Language: Python - Size: 8.67 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 3,232 - Forks: 240

mrpowers-io/spark-daria
Essential Spark extensions and helper methods ✨😲
Language: Scala - Size: 3.03 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 761 - Forks: 153

aryadhruv/LLMWorkbook
LLMWorkbook is a Python package that integrates Large Language Models (LLMs) with tabular datatypes - workbooks and dataframes for seamless data analysis and automation.
Language: Python - Size: 187 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5 - Forks: 2

zeknown/Pandas_in_Python-Retail_Supermarket
Data Wrangling through Python library such as Pandas. Data namely retail_supermarket extracted from Kaggle.com 🚀
Language: Jupyter Notebook - Size: 184 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

datavil/framex
A light-weight, dataset obtaining library for fast prototyping, tutorial creation, and experimenting.
Language: Python - Size: 2.81 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

jtablesaw/tablesaw
Java dataframe and visualization library
Language: Java - Size: 63.2 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 3,652 - Forks: 650

RaJharit77/Weather-Project
Repository for exam on the openweathermap api's project
Language: Jupyter Notebook - Size: 6.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

dflib/dflib
In-memory Java DataFrame library
Language: Java - Size: 5.74 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 268 - Forks: 25

sfu-db/connector-x
Fastest library to load data from DB to DataFrames in Rust and Python
Language: Rust - Size: 236 MB - Last synced at: 6 days ago - Pushed at: 27 days ago - Stars: 2,347 - Forks: 182

javascriptdata/danfojs
Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Language: TypeScript - Size: 79.1 MB - Last synced at: 7 days ago - Pushed at: 19 days ago - Stars: 4,953 - Forks: 217

approximatelabs/sketch
AI code-writing assistant that understands data content
Language: Python - Size: 8.98 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 2,275 - Forks: 119

lk-geimfari/mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
Language: Python - Size: 33.8 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 4,594 - Forks: 340

alteryx/woodwork
Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
Language: Python - Size: 3.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 154 - Forks: 22

Samba250/Mars
Explore Mars, the fourth planet from the Sun, known for its reddish surface and intriguing geological features. 🚀 Join the mission to uncover its secrets and pave the way for future human exploration! 🌌
Size: 19.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

apache/hamilton
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Language: Jupyter Notebook - Size: 94.8 MB - Last synced at: 8 days ago - Pushed at: 19 days ago - Stars: 2,176 - Forks: 149

pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Language: Python - Size: 11.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,428 - Forks: 173

vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Language: Python - Size: 133 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 8,403 - Forks: 600

scicloj/tablecloth
Dataset manipulation library built on the top of tech.ml.dataset
Language: Clojure - Size: 28.1 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 335 - Forks: 28

skrub-data/skrub
Machine learning with dataframes
Language: Python - Size: 12.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,422 - Forks: 149

antl3x/codeplot
▱ Codeplot is your infinity canvas for data exploration.
Language: TypeScript - Size: 13 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 28 - Forks: 6

modin-project/modin
Modin: Scale your Pandas workflows by changing a single line of code
Language: Python - Size: 51.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 10,206 - Forks: 664

oreilles/polars-st
Spatial extension for Polars DataFrames.
Language: Python - Size: 1.9 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 101 - Forks: 5

Peter-Opapa/pandas-data-manipulation
This project was created as part of my journey to master data engineering foundations, particularly focusing on data manipulation using pandas. It demonstrates my understanding of pandas syntax and real-world data transformation tasks that are crucial before building pipelines.
Language: Jupyter Notebook - Size: 511 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

tidyverse/duckplyr
A drop-in replacement for dplyr, powered by DuckDB for speed.
Language: R - Size: 15.6 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 333 - Forks: 20

DeepSpace2/StyleFrame
A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel
Language: Python - Size: 571 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 379 - Forks: 54

uwdata/arquero
Query processing and transformation of array-backed data tables.
Language: JavaScript - Size: 1.37 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 1,415 - Forks: 68

caerbannogwhite/aargh
A library that helps you out of data nightmares in Go. 🧙♂️
Language: Go - Size: 33.7 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 5 - Forks: 0

mabel-dev/orso
Orso is a row-based Python DataFrame library
Language: Python - Size: 1.44 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 2

ranaroussi/pystore
Fast data store for Pandas time-series data
Language: Python - Size: 155 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 579 - Forks: 102

mrjsj/msfabricutils
Spark-free Python utilities for Microsoft Fabric focused on Data Engineering using Polars and delta-rs
Language: Python - Size: 1.39 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 25 - Forks: 5

hmz-23/Movie-Recommender-System
Language: Jupyter Notebook - Size: 3.87 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

elastic/eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Language: Python - Size: 20.9 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 680 - Forks: 111

rocketlaunchr/dataframe-go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Language: Go - Size: 1010 KB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 1,255 - Forks: 99

bessarodrigo/dataviz_dashboard_revenue
Dashboard com Streamlit que calcula a variação mensal de faturamento de uma empresa de Telemedicina.
Language: Jupyter Notebook - Size: 163 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

haifengl/smile
Statistical Machine Intelligence & Learning Engine
Language: Java - Size: 246 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 6,205 - Forks: 1,143

Axect/Peroxide
Rust numeric library with high performance and friendly syntax
Language: Rust - Size: 12.6 MB - Last synced at: 9 days ago - Pushed at: 20 days ago - Stars: 639 - Forks: 32

iakov-kaiumov/gsheet-pandas
Bridge between pandas and Google Sheets
Language: Python - Size: 55.7 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 1

SwellDB/SwellDB
The data system that answers anything.
Language: Python - Size: 2.25 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 0

areshytko/typedframe
Typed wrappers over pandas DataFrames with schema validation
Language: Python - Size: 318 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 101 - Forks: 8

evetion/GeoDataFrames.jl
Simple geographical vector interaction built on top of ArchGDAL
Language: Julia - Size: 2.69 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 68 - Forks: 8

Mo7amed3bdelghany/Introduction-to-Pandas-Leetcode-
My Pandas practice solutions from LeetCode's official beginner study plan
Language: Jupyter Notebook - Size: 117 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

intake/akimbo
For when your data won't fit in your dataframe
Language: Python - Size: 419 KB - Last synced at: 8 days ago - Pushed at: 18 days ago - Stars: 47 - Forks: 6

rendner/py-plugin-dataframe-viewer
Plugin for JetBrains IDEs to view Python DataFrames when debugging.
Language: Python - Size: 5.27 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 15 - Forks: 1

Alex0x4b/akutils
High-level Python library for recurring data manipulation (Pandas, Python data structure, API, file manipulation, etc.).
Language: Python - Size: 69.3 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

chitralverma/scala-polars
Polars for Scala & Java projects!
Language: Scala - Size: 4.09 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 84 - Forks: 7

ThoughtWorksInc/daffy
Function decorators for Pandas Dataframe column name and data type validation
Language: Python - Size: 136 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 17 - Forks: 6

ivanildobarauna-dev/api-to-dataframe
Lightweight Python library that transforms REST API responses into well-structured Pandas DataFrames — with built-in retry logic, schema validation, and intelligent type inference.
Language: Python - Size: 669 KB - Last synced at: 9 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

michaelchu/optopsy
A nimble options backtesting library for Python
Language: Python - Size: 8.87 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 1,130 - Forks: 178

janssenhenning/aiida-dataframe
AiiDA data plugin for pandas DataFrame objects
Language: Python - Size: 142 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 5 - Forks: 1

dmnfarrell/pandastable
Table analysis in Tkinter using pandas DataFrames.
Language: Python - Size: 8.99 MB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 651 - Forks: 125

techascent/tech.ml.dataset
A Clojure high performance data processing system
Language: Clojure - Size: 9.59 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 706 - Forks: 34

cognitum-octopus/cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Language: C# - Size: 133 MB - Last synced at: 17 days ago - Pushed at: 25 days ago - Stars: 54 - Forks: 10

abdenlab/oxbow
Oxbow makes genomic data ready for high-performance analytics.
Language: Rust - Size: 16.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 81 - Forks: 9

heronshoes/wisconsin-benchmark
Wisconsin Benchmark dataset generator
Language: Ruby - Size: 729 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

adi-g15/kharcha
Tool to automate expense summary from SBI, HDFC, Credit Cards, Amazon Pay statements.
Language: Python - Size: 152 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 0

EdAbati/dataframes-haystack
Haystack custom components for your favourite dataframe library.
Language: Jupyter Notebook - Size: 258 KB - Last synced at: about 7 hours ago - Pushed at: 6 days ago - Stars: 3 - Forks: 0

kszucs/pandahouse
Pandas interface for Clickhouse database
Language: Python - Size: 61.5 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 238 - Forks: 69

microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
Language: C# - Size: 6.44 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 940 - Forks: 211

alexhallam/tv
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
Language: Rust - Size: 33.2 MB - Last synced at: 23 days ago - Pushed at: 6 months ago - Stars: 2,097 - Forks: 40

MrDataPsycho/data-pipelines-in-rust
Data pipeline example written in Rust with Polars and DataFusion DataFrame package
Language: Rust - Size: 142 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 1

tidypyverse/tidypandas
A grammar of data manipulation for pandas inspired by tidyverse
Language: Python - Size: 5.75 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 101 - Forks: 8

Kanaries/pygwalker
PyGWalker: Turn your dataframe into an interactive UI for visual analysis
Language: Python - Size: 62.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14,916 - Forks: 796

atsyplenkov/pastum
VS Code extension to transform table from clipboard to R, Python or Julia dataframe
Language: JavaScript - Size: 50.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 42 - Forks: 0

bertrandmartel/tableau-scraping
Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
Language: Python - Size: 485 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 135 - Forks: 22

fsanaulla/chronicler-spark
InfluxDB connector to Apache Spark on top of Chronicler
Language: Scala - Size: 243 KB - Last synced at: about 14 hours ago - Pushed at: 12 months ago - Stars: 28 - Forks: 4

fphammerle/freesurfer-stats 📦
Python Library to Read FreeSurfer's Cortical Parcellation Anatomical Statistics
Language: Python - Size: 469 KB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 1

CybercentreCanada/jupyterlab-sql-editor
A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino
Language: Jupyter Notebook - Size: 90.6 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 88 - Forks: 14

mathijs81/java-dataframes
A quick test of a couple of data frame libraries for Java
Language: Jupyter Notebook - Size: 424 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 21 - Forks: 11

maxwellt23/SwiftFrames
A Swift-native DataFrame library inspired by pandas — load, view, transform, and export tabular data with ease.
Language: Swift - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

pascalr0410/mySQLTableHelper
Simple module to load a Julia DataFrame into a MySql DB
Language: Julia - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Ashbyt/Python
Ashley Bythell - Python
Language: Jupyter Notebook - Size: 5.62 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

coding-kitties/PyIndicators
PyIndicators is a powerful and user-friendly Python library for technical analysis indicators, metrics and helper functions. Written entirely in Python, it requires no external dependencies, ensuring seamless integration and ease of use.
Language: Python - Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 1

Zybulon/h5pandas
Dataframes from HDF5 instantaneously
Language: Python - Size: 612 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0
