Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: parquet-files

strategicblue/parquet-floor

A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies

Language: Java - Size: 95.7 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 37 - Forks: 3

mongodb-labs/mongo-arrow

MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.

Language: Python - Size: 424 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 77 - Forks: 10

Cinchoo/ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Language: C# - Size: 39.6 MB - Last synced: 5 days ago - Pushed: about 1 month ago - Stars: 742 - Forks: 133

mjakubowski84/parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Language: Scala - Size: 2.03 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 276 - Forks: 68

masalinas/poc-minio-parquet-docker

PoC Minio Docker with parquet parser

Language: Python - Size: 16.6 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0

Yo-mah-Ya/File_Creator

create files which formats are like "orc", "parquet", "xlsx", "json" and so on with Python

Language: Python - Size: 25.4 KB - Last synced: 10 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

dotM87/triaina

big data project, information storage in hdfs

Size: 2.93 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

Ayushverma135/JSON-to-PARQUET-Parser

Easily convert JSON data into Parquet format for efficient storage and analysis. Simplify data processing and analysis pipelines by converting JSON objects into optimized Parquet files.

Language: Python - Size: 8.79 KB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0

uber/petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Language: Python - Size: 2.69 MB - Last synced: 19 days ago - Pushed: 6 months ago - Stars: 1,754 - Forks: 281

PRQL/prql-query 📦

Query and transform data with PRQL

Language: Rust - Size: 1.32 MB - Last synced: 16 days ago - Pushed: 8 months ago - Stars: 123 - Forks: 7

parquet-go/parquet-go Fork of segmentio/parquet-go

Go library to read/write Parquet files. Developed at Twilio Segment

Language: Go - Size: 7.89 MB - Last synced: 25 days ago - Pushed: 26 days ago - Stars: 172 - Forks: 36

Matbbastos/epw-analysis

Processing and exporting data from EPW files into other formats.

Language: Python - Size: 2.12 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

hannes/miniparquet 📦

Library to read a subset of Parquet files

Language: C++ - Size: 485 KB - Last synced: 8 days ago - Pushed: over 4 years ago - Stars: 43 - Forks: 7

tee8z/noaa-data-pipeline

NOAA data pipeline, queryable from the browser

Language: Rust - Size: 1.36 MB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

Ahbiels/FegTec

FegTec é uma empresa fictícia que quer transferir arquivos parquet contendo dados dos clientes da nuvem AWS para a Google Cloud

Language: Python - Size: 264 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

minio/spark-select

A library for Spark DataFrame using MinIO Select API

Language: Scala - Size: 65.4 KB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 96 - Forks: 18

Srking501/csc8101_coursework

A summative coursework for CSC8101 Engineering for AI

Language: Jupyter Notebook - Size: 168 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

mschermann/docker_apache_drill_datagrip

A docker image to read parquet files with drill in DataGrip

Language: Dockerfile - Size: 1.21 MB - Last synced: 3 months ago - Pushed: almost 5 years ago - Stars: 1 - Forks: 2

hrbrmstr/sergeant

:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'

Language: R - Size: 17.8 MB - Last synced: 13 days ago - Pushed: about 2 years ago - Stars: 125 - Forks: 15

Dorianteffo/vg-sales-glue-spark-terraform

ETL job with AWS Glue

Language: Python - Size: 872 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

msigrupo/node-red-contrib-parquet

Node-Red contrib that converts between a PARQUET string and its JavaScript object representation, in either direction.

Language: HTML - Size: 21.5 KB - Last synced: 4 months ago - Pushed: about 2 years ago - Stars: 3 - Forks: 2

igor-suhorukov/openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

Language: Java - Size: 6 MB - Last synced: 4 months ago - Pushed: 6 months ago - Stars: 73 - Forks: 6

cajuncoding/ParquetFiles.BlobHelpers

A simple library and console application to illustrate how to read and load data into class models from Parquet files saved to Azure Blob Storage using Parquet .Net (parquet-dotnet). This is useful for E-L-T processes whereby you need to load the data into Memory, Sql Server (e.g. Azure SQL), etc. or any other location where there is no built-in or default mechanism for working with Parquet data.

Language: C# - Size: 37.1 KB - Last synced: 30 days ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

johnbrandborg/s3-inventory-report 📦

Processes S3 Inventory Manifests and generates a report about the folder size and object size average

Language: Python - Size: 34.2 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 1 - Forks: 0

DataTech-Solutions/Threat-Detection-and-Visualization

Threat Detection and Visualization

Language: TSQL - Size: 11.9 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 25 - Forks: 153

edwinpro/MLOps_videogames

Proyecto de MLOps consiste en implementar una API para videojuegos de la plataforma Steam.

Language: Jupyter Notebook - Size: 139 MB - Last synced: 4 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

uhussain/WebCrawlerForOnlineInflation

Price Crawler - Tracking Price Inflation

Language: Python - Size: 387 KB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 155 - Forks: 47

JohannaRangel/DS-M4-Herramientas_Big_Data Fork of soyHenry/DS-M4-Herramientas_Big_Data

Proyecto Integrador

Language: Jupyter Notebook - Size: 23.5 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

IgnacioMB/csvcli

A light-weight command-line tool to browse and query CSV, Excel and Apache Parquet files, regardless of their size.

Language: Python - Size: 147 KB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 3 - Forks: 0

write4alive/Data-Engineering-Nano-Degree-Capstone-Project

Data Engineering Nano Degree Capstone Project

Language: Jupyter Notebook - Size: 23.8 MB - Last synced: 8 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

rupeshtiwari/kafka-spark-streaming-avro-in-python

Streaming kafka events using Spark in avro format and saving the events in parquet format

Language: Python - Size: 38.1 KB - Last synced: 26 days ago - Pushed: about 2 years ago - Stars: 4 - Forks: 1

alexkreidler/parquet2arrow

A fast and simple command-line (CLI) tool to convert a Parquet file to an Apache Arrow file

Language: Rust - Size: 11.7 KB - Last synced: 23 days ago - Pushed: about 2 years ago - Stars: 4 - Forks: 0

SurajSomani14/Read-And-Filter-Datalake-Files-Data

This azure function reads multiple files from given datalake folder, deserialize data and merge data from all files together. It can apply filters on data and respond with filtered data in requested format.

Language: C# - Size: 159 KB - Last synced: 9 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

RLesur/quarto-ojs-parquet-s3

A Quarto notebook requesting a parquet file stored in S3

Language: JavaScript - Size: 441 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

ostrokach/uniparc_xml_parser

UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).

Language: Rust - Size: 31.1 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

EnsleyEC/parquet-file-concepts

Language: Jupyter Notebook - Size: 23.4 KB - Last synced: 10 months ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

LouayMagdy/Weather-Stations-Monitoring Fork of basel-bytes/Weather-Stations-Monitoring

DDIA Course Project

Language: Java - Size: 46 MB - Last synced: 2 months ago - Pushed: 11 months ago - Stars: 3 - Forks: 1

mdarm/map-reduce-project

Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.

Language: TeX - Size: 3.7 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

OtavioHenrique/parquimetro

Simple and small CLI to work with parquet files

Language: Go - Size: 189 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 2 - Forks: 0

milamarcan/etl_aws_s3_spark_datalake

ETL pipeline that transforms JSON files from AWS S3 bucket to Parquet files also in S3 bucket

Language: Python - Size: 773 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 1 - Forks: 0

masum035/Bengali-Grapheme-Optical-Character-Recognition

Academic Machine Learning (6 months) Sessional Project

Language: Jupyter Notebook - Size: 99.6 KB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

NarayanSchuetz/edf2parquet

Simple utility package to convert EDF/EDF+ files into Apache Parquet format.

Language: Jupyter Notebook - Size: 2.88 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

gr3gor1/Adv-DBs

Advanced Databases Project

Language: Python - Size: 960 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

samvel1024/csv2pq Fork of Parquet/parquet-compatibility

Converts csv to Parquet

Language: Java - Size: 125 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

adrianulbona/osm-parquetizer

A converter for the OSM PBFs to Parquet files

Language: Java - Size: 75.2 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 72 - Forks: 30

matt40k/rpistatsv2

Daily scraps the data from rpi-imager-stats

Size: 14.1 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

domvwt/parquet-inspector

A command line tool for inspecting parquet files with PyArrow.

Language: Python - Size: 56.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Foroozani/BigData_PySpark

:bangbang: Handle Big Data for Machine Learning using Python and PySpark, Building ETL Pipelines with PySpark, MongoDB, and Bokeh

Language: Jupyter Notebook - Size: 35.1 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 6 - Forks: 4

trannguyenhan/tiki-data-analysis

Streaming data of Tiki with Kafka and processing with Spark, visualize with Elasticsearch & Kibana.

Language: Java - Size: 36.1 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 11 - Forks: 0

yaphet17/HDFS-File-Viewer

Help you to visualize hadoop file formats.

Language: Java - Size: 113 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

hrbrmstr/sergeant-caffeinated

:guardsman: ☕️ Tools to Transform and Query Data with 'Apache' 'Drill'

Language: R - Size: 179 KB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 7 - Forks: 1

renesugar/FileConvert

Converts between file formats such as CSV and Parquet

Language: C - Size: 3.65 MB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 14 - Forks: 1

m-kwiedor/lambda-merge-parquet

Merge Parquet Files on S3 with this AWS Lambda Function

Language: Python - Size: 263 KB - Last synced: 12 months ago - Pushed: over 3 years ago - Stars: 2 - Forks: 0

anjijava16/Multi_Cloud_DWH_Utils

Compare the Multi Cloud Data warehouse systems

Size: 14.6 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

slatawa/csv_parquet

Project showing integration of upstream file into your data lake. we look at handling high volume customized data formats and converting them into parquet.

Language: Python - Size: 385 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

13caroline/imdb-datasets

Managing large data sets projects (Data Science)

Language: Java - Size: 1.02 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

adrigrillo/NYCSparkTaxi

Apache Spark application to get the top ten frequent routes and profitable areas

Language: Jupyter Notebook - Size: 13.5 MB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 3 - Forks: 0

ankhipaul/python_demos

Practice of Python skill

Language: Python - Size: 10.4 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

effulgenz-emp/data_pull

cassandra database to parquet file

Language: Python - Size: 16.6 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 1

bdnf/Building-DataLake-with-Spark-and-S3

Data Engineering project on how to build Data Lake on S3 using Chicago Taxi Dataset

Language: Jupyter Notebook - Size: 958 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

alfredomartinm/parquetreader

Little demo project on how to read parquet files using the Avro libraries

Language: Java - Size: 5.86 KB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 0 - Forks: 1