Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: parquet-files
strategicblue/parquet-floor
A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies
Language: Java - Size: 95.7 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 37 - Forks: 3
mongodb-labs/mongo-arrow
MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
Language: Python - Size: 424 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 77 - Forks: 10
Cinchoo/ChoETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Language: C# - Size: 39.6 MB - Last synced: 5 days ago - Pushed: about 1 month ago - Stars: 742 - Forks: 133
mjakubowski84/parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Language: Scala - Size: 2.03 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 276 - Forks: 68
masalinas/poc-minio-parquet-docker
PoC Minio Docker with parquet parser
Language: Python - Size: 16.6 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0
Yo-mah-Ya/File_Creator
create files which formats are like "orc", "parquet", "xlsx", "json" and so on with Python
Language: Python - Size: 25.4 KB - Last synced: 10 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
dotM87/triaina
big data project, information storage in hdfs
Size: 2.93 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0
Ayushverma135/JSON-to-PARQUET-Parser
Easily convert JSON data into Parquet format for efficient storage and analysis. Simplify data processing and analysis pipelines by converting JSON objects into optimized Parquet files.
Language: Python - Size: 8.79 KB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Language: Python - Size: 2.69 MB - Last synced: 19 days ago - Pushed: 6 months ago - Stars: 1,754 - Forks: 281
PRQL/prql-query 📦
Query and transform data with PRQL
Language: Rust - Size: 1.32 MB - Last synced: 16 days ago - Pushed: 8 months ago - Stars: 123 - Forks: 7
parquet-go/parquet-go Fork of segmentio/parquet-go
Go library to read/write Parquet files. Developed at Twilio Segment
Language: Go - Size: 7.89 MB - Last synced: 25 days ago - Pushed: 26 days ago - Stars: 172 - Forks: 36
Matbbastos/epw-analysis
Processing and exporting data from EPW files into other formats.
Language: Python - Size: 2.12 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
hannes/miniparquet 📦
Library to read a subset of Parquet files
Language: C++ - Size: 485 KB - Last synced: 8 days ago - Pushed: over 4 years ago - Stars: 43 - Forks: 7
tee8z/noaa-data-pipeline
NOAA data pipeline, queryable from the browser
Language: Rust - Size: 1.36 MB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
Ahbiels/FegTec
FegTec é uma empresa fictícia que quer transferir arquivos parquet contendo dados dos clientes da nuvem AWS para a Google Cloud
Language: Python - Size: 264 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
minio/spark-select
A library for Spark DataFrame using MinIO Select API
Language: Scala - Size: 65.4 KB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 96 - Forks: 18
Srking501/csc8101_coursework
A summative coursework for CSC8101 Engineering for AI
Language: Jupyter Notebook - Size: 168 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
mschermann/docker_apache_drill_datagrip
A docker image to read parquet files with drill in DataGrip
Language: Dockerfile - Size: 1.21 MB - Last synced: 3 months ago - Pushed: almost 5 years ago - Stars: 1 - Forks: 2
hrbrmstr/sergeant
:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'
Language: R - Size: 17.8 MB - Last synced: 13 days ago - Pushed: about 2 years ago - Stars: 125 - Forks: 15
Dorianteffo/vg-sales-glue-spark-terraform
ETL job with AWS Glue
Language: Python - Size: 872 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
msigrupo/node-red-contrib-parquet
Node-Red contrib that converts between a PARQUET string and its JavaScript object representation, in either direction.
Language: HTML - Size: 21.5 KB - Last synced: 4 months ago - Pushed: about 2 years ago - Stars: 3 - Forks: 2
igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
Language: Java - Size: 6 MB - Last synced: 4 months ago - Pushed: 6 months ago - Stars: 73 - Forks: 6
cajuncoding/ParquetFiles.BlobHelpers
A simple library and console application to illustrate how to read and load data into class models from Parquet files saved to Azure Blob Storage using Parquet .Net (parquet-dotnet). This is useful for E-L-T processes whereby you need to load the data into Memory, Sql Server (e.g. Azure SQL), etc. or any other location where there is no built-in or default mechanism for working with Parquet data.
Language: C# - Size: 37.1 KB - Last synced: 30 days ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
johnbrandborg/s3-inventory-report 📦
Processes S3 Inventory Manifests and generates a report about the folder size and object size average
Language: Python - Size: 34.2 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 1 - Forks: 0
DataTech-Solutions/Threat-Detection-and-Visualization
Threat Detection and Visualization
Language: TSQL - Size: 11.9 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 25 - Forks: 153
edwinpro/MLOps_videogames
Proyecto de MLOps consiste en implementar una API para videojuegos de la plataforma Steam.
Language: Jupyter Notebook - Size: 139 MB - Last synced: 4 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
uhussain/WebCrawlerForOnlineInflation
Price Crawler - Tracking Price Inflation
Language: Python - Size: 387 KB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 155 - Forks: 47
JohannaRangel/DS-M4-Herramientas_Big_Data Fork of soyHenry/DS-M4-Herramientas_Big_Data
Proyecto Integrador
Language: Jupyter Notebook - Size: 23.5 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
IgnacioMB/csvcli
A light-weight command-line tool to browse and query CSV, Excel and Apache Parquet files, regardless of their size.
Language: Python - Size: 147 KB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 3 - Forks: 0
write4alive/Data-Engineering-Nano-Degree-Capstone-Project
Data Engineering Nano Degree Capstone Project
Language: Jupyter Notebook - Size: 23.8 MB - Last synced: 8 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
rupeshtiwari/kafka-spark-streaming-avro-in-python
Streaming kafka events using Spark in avro format and saving the events in parquet format
Language: Python - Size: 38.1 KB - Last synced: 26 days ago - Pushed: about 2 years ago - Stars: 4 - Forks: 1
alexkreidler/parquet2arrow
A fast and simple command-line (CLI) tool to convert a Parquet file to an Apache Arrow file
Language: Rust - Size: 11.7 KB - Last synced: 23 days ago - Pushed: about 2 years ago - Stars: 4 - Forks: 0
SurajSomani14/Read-And-Filter-Datalake-Files-Data
This azure function reads multiple files from given datalake folder, deserialize data and merge data from all files together. It can apply filters on data and respond with filtered data in requested format.
Language: C# - Size: 159 KB - Last synced: 9 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
RLesur/quarto-ojs-parquet-s3
A Quarto notebook requesting a parquet file stored in S3
Language: JavaScript - Size: 441 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
ostrokach/uniparc_xml_parser
UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).
Language: Rust - Size: 31.1 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
EnsleyEC/parquet-file-concepts
Language: Jupyter Notebook - Size: 23.4 KB - Last synced: 10 months ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
LouayMagdy/Weather-Stations-Monitoring Fork of basel-bytes/Weather-Stations-Monitoring
DDIA Course Project
Language: Java - Size: 46 MB - Last synced: 2 months ago - Pushed: 11 months ago - Stars: 3 - Forks: 1
mdarm/map-reduce-project
Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.
Language: TeX - Size: 3.7 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
OtavioHenrique/parquimetro
Simple and small CLI to work with parquet files
Language: Go - Size: 189 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 2 - Forks: 0
milamarcan/etl_aws_s3_spark_datalake
ETL pipeline that transforms JSON files from AWS S3 bucket to Parquet files also in S3 bucket
Language: Python - Size: 773 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 1 - Forks: 0
masum035/Bengali-Grapheme-Optical-Character-Recognition
Academic Machine Learning (6 months) Sessional Project
Language: Jupyter Notebook - Size: 99.6 KB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
NarayanSchuetz/edf2parquet
Simple utility package to convert EDF/EDF+ files into Apache Parquet format.
Language: Jupyter Notebook - Size: 2.88 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
gr3gor1/Adv-DBs
Advanced Databases Project
Language: Python - Size: 960 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
samvel1024/csv2pq Fork of Parquet/parquet-compatibility
Converts csv to Parquet
Language: Java - Size: 125 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
adrianulbona/osm-parquetizer
A converter for the OSM PBFs to Parquet files
Language: Java - Size: 75.2 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 72 - Forks: 30
matt40k/rpistatsv2
Daily scraps the data from rpi-imager-stats
Size: 14.1 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
domvwt/parquet-inspector
A command line tool for inspecting parquet files with PyArrow.
Language: Python - Size: 56.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
Foroozani/BigData_PySpark
:bangbang: Handle Big Data for Machine Learning using Python and PySpark, Building ETL Pipelines with PySpark, MongoDB, and Bokeh
Language: Jupyter Notebook - Size: 35.1 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 6 - Forks: 4
trannguyenhan/tiki-data-analysis
Streaming data of Tiki with Kafka and processing with Spark, visualize with Elasticsearch & Kibana.
Language: Java - Size: 36.1 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 11 - Forks: 0
yaphet17/HDFS-File-Viewer
Help you to visualize hadoop file formats.
Language: Java - Size: 113 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
hrbrmstr/sergeant-caffeinated
:guardsman: ☕️ Tools to Transform and Query Data with 'Apache' 'Drill'
Language: R - Size: 179 KB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 7 - Forks: 1
renesugar/FileConvert
Converts between file formats such as CSV and Parquet
Language: C - Size: 3.65 MB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 14 - Forks: 1
m-kwiedor/lambda-merge-parquet
Merge Parquet Files on S3 with this AWS Lambda Function
Language: Python - Size: 263 KB - Last synced: 12 months ago - Pushed: over 3 years ago - Stars: 2 - Forks: 0
anjijava16/Multi_Cloud_DWH_Utils
Compare the Multi Cloud Data warehouse systems
Size: 14.6 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
slatawa/csv_parquet
Project showing integration of upstream file into your data lake. we look at handling high volume customized data formats and converting them into parquet.
Language: Python - Size: 385 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
13caroline/imdb-datasets
Managing large data sets projects (Data Science)
Language: Java - Size: 1.02 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
adrigrillo/NYCSparkTaxi
Apache Spark application to get the top ten frequent routes and profitable areas
Language: Jupyter Notebook - Size: 13.5 MB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 3 - Forks: 0
ankhipaul/python_demos
Practice of Python skill
Language: Python - Size: 10.4 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
effulgenz-emp/data_pull
cassandra database to parquet file
Language: Python - Size: 16.6 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 1
bdnf/Building-DataLake-with-Spark-and-S3
Data Engineering project on how to build Data Lake on S3 using Chicago Taxi Dataset
Language: Jupyter Notebook - Size: 958 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
alfredomartinm/parquetreader
Little demo project on how to read parquet files using the Avro libraries
Language: Java - Size: 5.86 KB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 0 - Forks: 1