datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vim89%2Fdatapipelines-essentials-python
PURL: pkg:github/vim89/datapipelines-essentials-python

Stars: 53
Forks: 34
Open issues: 1

License: apache-2.0
Language: Python
Size: 1.76 MB
Dependencies parsed at: Pending

Created at: about 6 years ago
Updated at: over 1 year ago
Pushed at: over 2 years ago
Last synced at: over 1 year ago

Topics: apache-spark, big-data, data-pipeline, datalake, etl, etl-components, etl-framework, etl-pipeline, hadoop, hadoop-hdfs, hadoop-mapreduce, pyspark, python, python3, spark, spark-sql, xml, xml-parsing

Readme

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / vim89 / datapipelines-essentials-python