Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

Package Usage: go: github.com/fraugster/parquet-go

Package goparquet is an implementation of the parquet file format in Go. It provides functionality to both read and write parquet files, as well as high-level functionality to manage the data schema of parquet files, to directly write Go objects to parquet files using automatic or custom marshalling and to read records from parquet files into Go objects using automatic or custom marshalling. parquet is a file format to store nested data structures in a flat columnar format. By storing in a column-oriented way, it allows for efficient reading of individual columns without having to read and decode complete rows. This allows for efficient reading and faster processing when using the file format in conjunction with distributed data processing frameworks like Apache Hadoop or distributed SQL query engines like Presto and AWS Athena. This particular implementation is divided into several packages. The top-level package that you're currently viewing is the low-level implementation of the file format. It is accompanied by the sub-packages parquetschema and floor. parquetschema provides functionality to parse textual schema definitions as well as the data types to manually or programmatically construct schema definitions by other means that are open to the user. The textual schema definition format is based on the barely documented schema definition format that is implemented in the parquet Java implementation. See the parquetschema sub-package for further documentation on how to use this package and the grammar of the schema definition format as well as examples. floor is a high-level wrapper around the low-level package. It provides functionality to open parquet files to read from them or to write to them. When reading from parquet files, floor takes care of automatically unmarshal the low-level data into the user-provided Go object. When writing to parquet files, user-provided Go objects are first marshalled to a low-level data structure that is then written to the parquet file. These mechanisms allow to directly read and write Go objects without having to deal with the details of the low-level parquet format. Alternatively, marshalling and unmarshalling can be implemented in a custom manner, giving the user maximum flexibility in case of disparities between the parquet schema definition and the actual Go data structure. For more information, please refer to the floor sub-package's documentation. To aid in working with parquet files, this package also provides a commandline tool named "parquet-tool" that allows you to inspect a parquet file's schema, meta data, row count and content as well as to merge and split parquet files. When operating with parquet files, most users should be able to cover their regular use cases of reading and writing files using just the high-level floor package as well as the parquetschema package. Only if a user has more special requirements in how to work with the parquet files, it is advisable to use this low-level package. To write to a parquet file, the type provided by this package is the FileWriter. Create a new *FileWriter object using the NewFileWriter function. You have a number of options available with which you can influence the FileWriter's behaviour. You can use these options to e.g. set meta data, the compression algorithm to use, the schema definition to use, or whether the data should be written in the V2 format. If you didn't set a schema definition, you then need to manually create columns using the functions NewDataColumn, NewListColumn and NewMapColumn, and then add them to the FileWriter by using the AddColumn method. To further structure your data into groups, use AddGroup to create groups. When you add columns to groups, you need to provide the full column name using dotted notation (e.g. "groupname.fieldname") to AddColumn. Using the AddData method, you can then add records. The provided data is of type map[string]interface{}. This data can be nested: to provide data for a repeated field, the data type to use for the map value is []interface{}. When the provided data is a group, the data type for the group itself again needs to be map[string]interface{}. The data within a parquet file is divided into row groups of a certain size. You can either set the desired row group size as a FileWriterOption, or you can manually check the estimated data size of the current row group using the CurrentRowGroupSize method, and use FlushRowGroup to write the data to disk and start a new row group. Please note that CurrentRowGroupSize only estimates the _uncompressed_ data size. If you've enabled compression, it is impossible to predict the compressed data size, so the actual row groups written to disk may be a lot smaller than uncompressed, depending on how efficiently your data can be compressed. When you're done writing, always use the Close method to flush any remaining data and to write the file's footer. To read from files, create a FileReader object using the NewFileReader function. You can optionally provide a list of columns to read. If these are set, only these columns are read from the file, while all other columns are ignored. If no columns are proided, then all columns are read. With the FileReader, you can then go through the row groups (using PreLoad and SkipRowGroup). and iterate through the row data in each row group (using NextRow). To find out how many rows to expect in total and per row group, use the NumRows and RowGroupNumRows methods. The number of row groups can be determined using the RowGroupCount method.
15 versions
Latest release: almost 2 years ago
63 dependent packages

View more package details: https://packages.ecosyste.ms/registries/proxy.golang.org/packages/github.com/fraugster/parquet-go

View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/fraugster%2Fparquet-go

Dependent Repos 166

cockroachdb/cockroach-gen
CockroachDB with pre-generated Go code
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 4.49 GB - Last synced: 3 months ago - Pushed: 3 months ago

caraml-dev/merlin
Kubernetes-friendly ML model management, deployment, and serving.
  • v0.10.0 api/go.mod
  • v0.10.0 api/go.sum

Size: 16.9 MB - Last synced: 12 days ago - Pushed: 13 days ago

dileepdkumar/https-github.com-cockroachdb-cockroach2
  • v0.4.0 go.mod
  • v0.4.0 go.sum

Size: 998 MB - Last synced: about 2 months ago - Pushed: about 2 months ago

minio/minio
The Object Store for AI Data Infrastructure
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 124 MB - Last synced: about 8 hours ago - Pushed: about 8 hours ago

singlestore-labs/demo-realtime-digital-marketing
This application is a demo of how to use SingleStore to serve ads to users based on their behavior and realtime location.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 18.2 MB - Last synced: 3 months ago - Pushed: 3 months ago

rail/test-cockroach2
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1020 MB - Last synced: 5 months ago - Pushed: almost 2 years ago

cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 2.51 GB - Last synced: 12 days ago - Pushed: 12 days ago

bsm/feedx
  • v0.11.0 ext/parquet/go.mod
  • v0.11.0 ext/parquet/go.sum

Size: 202 KB - Last synced: 12 days ago - Pushed: 13 days ago

knz/cockroach Fork of cockroachdb/cockroach
A Scalable, Survivable, Strongly-Consistent SQL Database
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.32 GB - Last synced: about 1 month ago - Pushed: 8 months ago

jordanlewis/cockroach Fork of cockroachdb/cockroach
A Scalable, Survivable, Strongly-Consistent SQL Database
  • v0.10.0 go.sum
  • v0.10.0 go.mod

Size: 1.28 GB - Last synced: 2 months ago - Pushed: 2 months ago

tbg/cockroach Fork of cockroachdb/cockroach
A Scalable, Geo-Replicated, Transactional Datastore
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.3 GB - Last synced: 10 months ago - Pushed: 10 months ago

vadmeste/minio Fork of minio/minio
Minio is an open source object storage inspired by Amazon S3 and Facebook Haystack
  • v0.12.0 go.sum
  • v0.12.0 go.mod

Size: 127 MB - Last synced: 12 days ago - Pushed: 12 days ago

PretendoNetwork/minio Fork of minio/minio
Modified MinIO for Pretendo Network S3
  • v0.12.0 go.mod
  • v0.12.0 go.sum

Size: 109 MB - Last synced: 8 months ago - Pushed: 8 months ago

GuinsooLab/annastore
High performance OSS storage platform.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 16.1 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

Kiterepo/minio Fork of minio/minio
Minio is an open source object storage server compatible with Amazon S3 APIs
  • v0.12.0 go.mod
  • v0.12.0 go.sum

Size: 111 MB - Last synced: 2 months ago - Pushed: 2 months ago

kokizzu/minio Fork of minio/minio
MinIO is a high performance object storage server compatible with Amazon S3 APIs
  • v0.12.0 go.mod
  • v0.12.0 go.sum

Size: 123 MB - Last synced: about 19 hours ago - Pushed: about 21 hours ago

andyyang890/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.48 GB - Last synced: 18 days ago - Pushed: 18 days ago

vdavalon01/minio Fork of minio/minio
High Performance, Kubernetes Native Object Storage
  • v0.12.0 go.mod
  • v0.12.0 go.sum

Size: 110 MB - Last synced: 10 months ago - Pushed: 10 months ago

surahman/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.31 GB - Last synced: 6 months ago - Pushed: 6 months ago

dben/plex-go-sync
A command line tool to sync a main plex library with a backup remote one.
  • v0.10.0 go.sum
  • v0.11.0 go.sum

Size: 182 KB - Last synced: 10 months ago - Pushed: over 1 year ago

ZhouXing19/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.51 GB - Last synced: 28 days ago - Pushed: 28 days ago

rluna-database/cloud/cockroach
CockroachDB - the open-source, cloud-native SQL database. https://www.cockroachlabs.com
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Last synced: 11 months ago

wmutschl/minio Fork of minio/minio
High Performance, Kubernetes Native Object Storage
  • v0.12.0 go.mod
  • v0.12.0 go.sum

Size: 111 MB - Last synced: about 1 month ago - Pushed: 4 months ago

grovely/vendor/github.com/cockroachdb/cockroach
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Last synced: 11 months ago

brimdata/zync
Kafka connector to sync Zed lakes to and from Kafka topics
  • v0.10.1-0.20220222153523-e6b70a8a7212 go.mod
  • v0.10.1-0.20220222153523-e6b70a8a7212 go.sum

Size: 312 KB - Last synced: 3 months ago - Pushed: 3 months ago

mimiro-io/objectstorage-datalayer
  • v0.11.0 go.mod
  • v0.11.0 go.sum

Size: 465 KB - Last synced: 20 days ago - Pushed: 21 days ago

advancemg/vimb-loader
Сервис коммуникации с ВИМБ
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 905 KB - Last synced: 5 months ago - Pushed: about 1 year ago

mendelics/clinvar
Clinvar parser
  • v0.11.0 go.mod
  • v0.11.0 go.sum

Size: 48.9 MB - Last synced: 12 months ago - Pushed: 12 months ago

akrennmair/parquet-go-block-compressors 📦
Extension for github.com/fraugster/parquet-go to support more compression algorithms.
  • v0.10.0 brotli/go.mod
  • v0.10.0 brotli/go.sum
  • v0.10.0 lz4raw/go.mod
  • v0.10.0 lz4raw/go.sum
  • v0.10.0 lzo/go.mod
  • v0.8.0 lzo/go.sum
  • v0.10.0 lzo/go.sum
  • v0.10.0 zstd/go.mod
  • v0.8.0 zstd/go.sum
  • v0.10.0 zstd/go.sum

Size: 29.3 KB - Last synced: 8 months ago - Pushed: over 2 years ago

conorlynchgit/objectstoretest
Config files for my GitHub profile.
  • v0.10.0 test/minio/go.mod
  • v0.10.0 test/minio/go.sum

Size: 47.6 MB - Last synced: 11 months ago - Pushed: 11 months ago

JeffSwenson/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.43 GB - Last synced: 8 months ago - Pushed: 8 months ago

LubyRuffy/goflow
just like pipeline for data
  • v0.11.0 go.mod

Size: 898 KB - Last synced: 10 months ago - Pushed: 10 months ago

security-l/logging-and-anallysis/zq
Command-line processor for structured logs
  • v0.3.0 go.mod

Last synced: over 1 year ago

kokizzu/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.46 GB - Last synced: about 2 months ago - Pushed: about 2 months ago

mendelics/vcf2df
VCF to parquet converter
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 347 KB - Last synced: 3 months ago - Pushed: 6 months ago

automation555/cockroach
cockroach
  • v0.6.1 go.mod
  • v0.6.1 go.sum

Size: 76.6 MB - Last synced: about 1 year ago - Pushed: over 2 years ago

sinhaashish/minio Fork of minio/minio
Minio is an open source object storage server compatible with Amazon S3 APIs
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 118 MB - Last synced: over 1 year ago - Pushed: over 1 year ago

OdyseeTeam/transcoder
Transcoder server for Odysee media
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 959 KB - Last synced: about 2 months ago - Pushed: 3 months ago

libra-violet/MyminIO
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 11.9 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago

t6085/minio
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Last synced: 11 months ago

ybarney/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.07 GB - Last synced: about 1 year ago - Pushed: over 1 year ago

gitlab-gold/secure-scanner-testbed/minio
Clone of https://github.com/minio/minio.git from 2022-05-27
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Last synced: over 1 year ago

exactlylabs/parquet2csv
Simple tool to convert a Parquet file to a CSV written in Go/ Golang
  • v0.6.1 go.mod
  • v0.6.1 go.sum

Size: 7.81 KB - Last synced: 11 months ago - Pushed: over 2 years ago

wenyihu6/cockroach Fork of cockroachdb/cockroach
Forked copy of CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.10.0 go.mod
  • v0.10.0 go.sum

Size: 1.47 GB - Last synced: about 23 hours ago - Pushed: about 23 hours ago

shralex/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
  • v0.4.0 go.sum
  • v0.4.0 go.mod

Size: 1.49 GB - Last synced: 3 months ago - Pushed: 3 months ago

tardunge/talaria Fork of talariadb/talaria
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
  • v0.3.0 go.mod
  • v0.3.0 go.sum

Size: 12.8 MB - Last synced: 11 months ago - Pushed: over 1 year ago

TDary/GoProject
GoServer-Development
  • v0.10.0 GoStudy/minio/go.mod
  • v0.10.0 GoStudy/minio/go.sum

Size: 26.1 MB - Last synced: about 1 year ago - Pushed: over 1 year ago