Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
Package Usage: go: github.com/fraugster/parquet-go
Package goparquet is an implementation of the parquet file format in Go. It provides
functionality to both read and write parquet files, as well as high-level functionality
to manage the data schema of parquet files, to directly write Go objects to parquet files
using automatic or custom marshalling and to read records from parquet files into
Go objects using automatic or custom marshalling.
parquet is a file format to store nested data structures in a flat columnar format. By
storing in a column-oriented way, it allows for efficient reading of individual columns
without having to read and decode complete rows. This allows for efficient reading and
faster processing when using the file format in conjunction with distributed data processing
frameworks like Apache Hadoop or distributed SQL query engines like Presto and AWS Athena.
This particular implementation is divided into several packages. The top-level package
that you're currently viewing is the low-level implementation of the file format. It is
accompanied by the sub-packages parquetschema and floor.
parquetschema provides functionality to parse textual schema definitions as well as the
data types to manually or programmatically construct schema definitions by other means
that are open to the user. The textual schema definition format is based on the barely
documented schema definition format that is implemented in the parquet Java implementation.
See the parquetschema sub-package for further documentation on how to use this package
and the grammar of the schema definition format as well as examples.
floor is a high-level wrapper around the low-level package. It provides functionality
to open parquet files to read from them or to write to them. When reading from parquet files,
floor takes care of automatically unmarshal the low-level data into the user-provided
Go object. When writing to parquet files, user-provided Go objects are first marshalled
to a low-level data structure that is then written to the parquet file. These mechanisms
allow to directly read and write Go objects without having to deal with the details of the
low-level parquet format. Alternatively, marshalling and unmarshalling can be implemented
in a custom manner, giving the user maximum flexibility in case of disparities between
the parquet schema definition and the actual Go data structure. For more information, please
refer to the floor sub-package's documentation.
To aid in working with parquet files, this package also provides a commandline tool named
"parquet-tool" that allows you to inspect a parquet file's schema, meta data, row count and
content as well as to merge and split parquet files.
When operating with parquet files, most users should be able to cover their regular use cases
of reading and writing files using just the high-level floor package as well as the
parquetschema package. Only if a user has more special requirements in how to work with
the parquet files, it is advisable to use this low-level package.
To write to a parquet file, the type provided by this package is the FileWriter. Create a
new *FileWriter object using the NewFileWriter function. You have a number of options available
with which you can influence the FileWriter's behaviour. You can use these options to e.g. set
meta data, the compression algorithm to use, the schema definition to use, or whether the
data should be written in the V2 format. If you didn't set a schema definition, you then need
to manually create columns using the functions NewDataColumn, NewListColumn and NewMapColumn,
and then add them to the FileWriter by using the AddColumn method. To further structure
your data into groups, use AddGroup to create groups. When you add columns to groups, you need
to provide the full column name using dotted notation (e.g. "groupname.fieldname") to AddColumn.
Using the AddData method, you can then add records. The provided data is of type map[string]interface{}.
This data can be nested: to provide data for a repeated field, the data type to use for the
map value is []interface{}. When the provided data is a group, the data type for the group itself
again needs to be map[string]interface{}.
The data within a parquet file is divided into row groups of a certain size. You can either set
the desired row group size as a FileWriterOption, or you can manually check the estimated data
size of the current row group using the CurrentRowGroupSize method, and use FlushRowGroup
to write the data to disk and start a new row group. Please note that CurrentRowGroupSize
only estimates the _uncompressed_ data size. If you've enabled compression, it is impossible
to predict the compressed data size, so the actual row groups written to disk may be a lot
smaller than uncompressed, depending on how efficiently your data can be compressed.
When you're done writing, always use the Close method to flush any remaining data and to
write the file's footer.
To read from files, create a FileReader object using the NewFileReader function. You can
optionally provide a list of columns to read. If these are set, only these columns are read
from the file, while all other columns are ignored. If no columns are proided, then all
columns are read.
With the FileReader, you can then go through the row groups (using PreLoad and SkipRowGroup).
and iterate through the row data in each row group (using NextRow). To find out how many rows
to expect in total and per row group, use the NumRows and RowGroupNumRows methods. The number
of row groups can be determined using the RowGroupCount method.
15 versions
Latest release: almost 2 years ago
63 dependent packages
View more package details: https://packages.ecosyste.ms/registries/proxy.golang.org/packages/github.com/fraugster/parquet-go
View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/fraugster%2Fparquet-go
Dependent Repos 166
cockroachdb/cockroach-gen
CockroachDB with pre-generated Go code- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 4.49 GB - Last synced: 3 months ago - Pushed: 3 months ago
caraml-dev/merlin
Kubernetes-friendly ML model management, deployment, and serving.- v0.10.0 api/go.mod
- v0.10.0 api/go.sum
Size: 16.9 MB - Last synced: 12 days ago - Pushed: 13 days ago
dileepdkumar/https-github.com-cockroachdb-cockroach2
- v0.4.0 go.mod
- v0.4.0 go.sum
Size: 998 MB - Last synced: about 2 months ago - Pushed: about 2 months ago
minio/minio
The Object Store for AI Data Infrastructure- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 124 MB - Last synced: about 8 hours ago - Pushed: about 8 hours ago
singlestore-labs/demo-realtime-digital-marketing
This application is a demo of how to use SingleStore to serve ads to users based on their behavior and realtime location.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 18.2 MB - Last synced: 3 months ago - Pushed: 3 months ago
rail/test-cockroach2
- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1020 MB - Last synced: 5 months ago - Pushed: almost 2 years ago
cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 2.51 GB - Last synced: 12 days ago - Pushed: 12 days ago
bsm/feedx
- v0.11.0 ext/parquet/go.mod
- v0.11.0 ext/parquet/go.sum
Size: 202 KB - Last synced: 12 days ago - Pushed: 13 days ago
knz/cockroach Fork of cockroachdb/cockroach
A Scalable, Survivable, Strongly-Consistent SQL Database- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.32 GB - Last synced: about 1 month ago - Pushed: 8 months ago
jordanlewis/cockroach Fork of cockroachdb/cockroach
A Scalable, Survivable, Strongly-Consistent SQL Database- v0.10.0 go.sum
- v0.10.0 go.mod
Size: 1.28 GB - Last synced: 2 months ago - Pushed: 2 months ago
tbg/cockroach Fork of cockroachdb/cockroach
A Scalable, Geo-Replicated, Transactional Datastore- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.3 GB - Last synced: 10 months ago - Pushed: 10 months ago
vadmeste/minio Fork of minio/minio
Minio is an open source object storage inspired by Amazon S3 and Facebook Haystack- v0.12.0 go.sum
- v0.12.0 go.mod
Size: 127 MB - Last synced: 12 days ago - Pushed: 12 days ago
PretendoNetwork/minio Fork of minio/minio
Modified MinIO for Pretendo Network S3- v0.12.0 go.mod
- v0.12.0 go.sum
Size: 109 MB - Last synced: 8 months ago - Pushed: 8 months ago
GuinsooLab/annastore
High performance OSS storage platform.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 16.1 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
Kiterepo/minio Fork of minio/minio
Minio is an open source object storage server compatible with Amazon S3 APIs- v0.12.0 go.mod
- v0.12.0 go.sum
Size: 111 MB - Last synced: 2 months ago - Pushed: 2 months ago
kokizzu/minio Fork of minio/minio
MinIO is a high performance object storage server compatible with Amazon S3 APIs- v0.12.0 go.mod
- v0.12.0 go.sum
Size: 123 MB - Last synced: about 19 hours ago - Pushed: about 21 hours ago
andyyang890/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.48 GB - Last synced: 18 days ago - Pushed: 18 days ago
vdavalon01/minio Fork of minio/minio
High Performance, Kubernetes Native Object Storage- v0.12.0 go.mod
- v0.12.0 go.sum
Size: 110 MB - Last synced: 10 months ago - Pushed: 10 months ago
surahman/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.31 GB - Last synced: 6 months ago - Pushed: 6 months ago
dben/plex-go-sync
A command line tool to sync a main plex library with a backup remote one.- v0.10.0 go.sum
- v0.11.0 go.sum
Size: 182 KB - Last synced: 10 months ago - Pushed: over 1 year ago
ZhouXing19/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.51 GB - Last synced: 28 days ago - Pushed: 28 days ago
rluna-database/cloud/cockroach
CockroachDB - the open-source, cloud-native SQL database. https://www.cockroachlabs.com- v0.10.0 go.mod
- v0.10.0 go.sum
Last synced: 11 months ago
wmutschl/minio Fork of minio/minio
High Performance, Kubernetes Native Object Storage- v0.12.0 go.mod
- v0.12.0 go.sum
Size: 111 MB - Last synced: about 1 month ago - Pushed: 4 months ago
grovely/vendor/github.com/cockroachdb/cockroach
- v0.10.0 go.mod
- v0.10.0 go.sum
Last synced: 11 months ago
brimdata/zync
Kafka connector to sync Zed lakes to and from Kafka topics- v0.10.1-0.20220222153523-e6b70a8a7212 go.mod
- v0.10.1-0.20220222153523-e6b70a8a7212 go.sum
Size: 312 KB - Last synced: 3 months ago - Pushed: 3 months ago
mimiro-io/objectstorage-datalayer
- v0.11.0 go.mod
- v0.11.0 go.sum
Size: 465 KB - Last synced: 20 days ago - Pushed: 21 days ago
advancemg/vimb-loader
Сервис коммуникации с ВИМБ- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 905 KB - Last synced: 5 months ago - Pushed: about 1 year ago
mendelics/clinvar
Clinvar parser- v0.11.0 go.mod
- v0.11.0 go.sum
Size: 48.9 MB - Last synced: 12 months ago - Pushed: 12 months ago
akrennmair/parquet-go-block-compressors 📦
Extension for github.com/fraugster/parquet-go to support more compression algorithms.- v0.10.0 brotli/go.mod
- v0.10.0 brotli/go.sum
- v0.10.0 lz4raw/go.mod
- v0.10.0 lz4raw/go.sum
- v0.10.0 lzo/go.mod
- v0.8.0 lzo/go.sum
- v0.10.0 lzo/go.sum
- v0.10.0 zstd/go.mod
- v0.8.0 zstd/go.sum
- v0.10.0 zstd/go.sum
Size: 29.3 KB - Last synced: 8 months ago - Pushed: over 2 years ago
conorlynchgit/objectstoretest
Config files for my GitHub profile.- v0.10.0 test/minio/go.mod
- v0.10.0 test/minio/go.sum
Size: 47.6 MB - Last synced: 11 months ago - Pushed: 11 months ago
JeffSwenson/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.43 GB - Last synced: 8 months ago - Pushed: 8 months ago
LubyRuffy/goflow
just like pipeline for data- v0.11.0 go.mod
Size: 898 KB - Last synced: 10 months ago - Pushed: 10 months ago
security-l/logging-and-anallysis/zq
Command-line processor for structured logs- v0.3.0 go.mod
Last synced: over 1 year ago
kokizzu/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.46 GB - Last synced: about 2 months ago - Pushed: about 2 months ago
mendelics/vcf2df
VCF to parquet converter- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 347 KB - Last synced: 3 months ago - Pushed: 6 months ago
automation555/cockroach
cockroach- v0.6.1 go.mod
- v0.6.1 go.sum
Size: 76.6 MB - Last synced: about 1 year ago - Pushed: over 2 years ago
sinhaashish/minio Fork of minio/minio
Minio is an open source object storage server compatible with Amazon S3 APIs- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 118 MB - Last synced: over 1 year ago - Pushed: over 1 year ago
OdyseeTeam/transcoder
Transcoder server for Odysee media- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 959 KB - Last synced: about 2 months ago - Pushed: 3 months ago
libra-violet/MyminIO
- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 11.9 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago
ybarney/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.07 GB - Last synced: about 1 year ago - Pushed: over 1 year ago
gitlab-gold/secure-scanner-testbed/minio
Clone of https://github.com/minio/minio.git from 2022-05-27- v0.10.0 go.mod
- v0.10.0 go.sum
Last synced: over 1 year ago
exactlylabs/parquet2csv
Simple tool to convert a Parquet file to a CSV written in Go/ Golang- v0.6.1 go.mod
- v0.6.1 go.sum
Size: 7.81 KB - Last synced: 11 months ago - Pushed: over 2 years ago
wenyihu6/cockroach Fork of cockroachdb/cockroach
Forked copy of CockroachDB - the open source, cloud-native distributed SQL database.- v0.10.0 go.mod
- v0.10.0 go.sum
Size: 1.47 GB - Last synced: about 23 hours ago - Pushed: about 23 hours ago
shralex/cockroach Fork of cockroachdb/cockroach
CockroachDB - the open source, cloud-native distributed SQL database.- v0.4.0 go.sum
- v0.4.0 go.mod
Size: 1.49 GB - Last synced: 3 months ago - Pushed: 3 months ago
tardunge/talaria Fork of talariadb/talaria
TalariaDB is a distributed, highly available, and low latency time-series database for Presto- v0.3.0 go.mod
- v0.3.0 go.sum
Size: 12.8 MB - Last synced: 11 months ago - Pushed: over 1 year ago
TDary/GoProject
GoServer-Development- v0.10.0 GoStudy/minio/go.mod
- v0.10.0 GoStudy/minio/go.sum
Size: 26.1 MB - Last synced: about 1 year ago - Pushed: over 1 year ago