An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: aws-glue

aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Language: Python - Size: 16.1 MB - Last synced at: about 1 hour ago - Pushed at: 5 days ago - Stars: 4,016 - Forks: 706

gps31320779/insightflow-retail-economic-pipeline

A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.

Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

ev2900/Iceberg_update_metadata_script

Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)

Language: Python - Size: 735 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

ev2900/Iceberg_Glue_register_table

Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog

Language: Python - Size: 549 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 1

ev2900/Glue_Examples

PySpark code samples designed for AWS Glue

Language: Python - Size: 51.8 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

schubergphilis/terraform-aws-mcaf-glue-job

A Terraform module that creates a Glue job

Language: HCL - Size: 43.9 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

data-dot-all/dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

Language: Python - Size: 97.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 242 - Forks: 82

akshay6991/data-engineer1

End to End Data Engineering Projects

Language: Python - Size: 1.25 MB - Last synced at: about 12 hours ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

shmokmt/awscrondoc

List up cron expressions registered in Amazon Web Services.

Language: Go - Size: 84 KB - Last synced at: 2 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

lindsaygelle/AWSComprehend

AWS Comprehend is an event-driven, serverless data processing pipeline that leverages AWS services to perform natural language processing and analysis on user-submitted text files.

Language: HCL - Size: 1.74 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

vsingh55/NBA-Analytics-Data-Lake

A sports analytics data lake leveraging AWS S3 for storage, AWS Glue for data cataloging, and AWS Athena for querying. Python scripts are used for data ingestion and manages the infrastructure.

Language: Python - Size: 2.93 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

pizofreude/insightflow-retail-economic-pipeline

A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.

Language: HCL - Size: 672 KB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

cloudposse/terraform-aws-glue

Terraform modules for provisioning and managing AWS Glue resources

Language: HCL - Size: 3.93 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 30 - Forks: 33

ev2900/MongoDB_Streams_Glue_Iceberg

Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date

Language: Python - Size: 27.3 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

agnostk/nasa-space-activity

A scalable, cloud-native ETL pipeline that extracts, transforms, and enriches data from NASA APIs using AWS Glue, Lightsail, and RDS — all orchestrated with Terraform. Features modular design, medallion architecture (bronze, silver, gold), image metadata extraction and classification with PyTorch, and a bonus Mosaic Generator app.

Language: Python - Size: 2.52 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

hackolade/glue

Hackolade(https://hackolade.com) plugin for AWS Glue Data Catalog

Language: JavaScript - Size: 22.9 MB - Last synced at: 11 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 8

tokern/lakecli

A CLI to manage and monitor permissions in AWS Lake Formation

Language: Python - Size: 729 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 8

dgomesbr/awesome-aws-workshops

(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.

Language: HTML - Size: 1.49 MB - Last synced at: about 19 hours ago - Pushed at: over 3 years ago - Stars: 411 - Forks: 113

sjlewis25/pizza-delivery-pipeline

Simulates a real-world data pipeline for a pizza delivery service using AWS services and Terraform. Ingests and processes delivery data with S3, triggers Lambda functions for processing, and stores structured data in DynamoDB. Highlights use of automation, event-driven triggers, and real-time cloud-based data workflows.

Language: HCL - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

dashmug/glue-utils

Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs by reducing boilerplate code, increasing type safety, and improving IDE auto-completion.

Language: Python - Size: 697 KB - Last synced at: 23 days ago - Pushed at: 25 days ago - Stars: 6 - Forks: 2

vaxdata22/Customer-Churn-Data-Analytics-ETL-Pipeline-by-Airflow-on-EC2

This is an end-to-end AWS Cloud ETL project. This orchestration uses Apache Airflow on AWS EC2 as well as AWS Glue. It demonstrates how to build ETL pipeline that would perform data transform using Glue job/crawler as well as loading into a Redshift table. It also shows how to connect Amazon Athena to Glue Data Catalog, and Power BI to Redshift.

Language: Python - Size: 7.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

tokern/piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

Language: Python - Size: 1.38 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 306 - Forks: 99

awslabs/amazon-athena-cross-account-catalog 📦

🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena

Language: Python - Size: 150 KB - Last synced at: 2 days ago - Pushed at: almost 3 years ago - Stars: 30 - Forks: 19

averemee-si/ora2iceberg

Transfer data from Oracle database tables, views, and query results to Apache Iceberg tables

Language: Java - Size: 392 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 6.88 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 59 - Forks: 38

aws-samples/monitoring-apache-iceberg-table-metadata-layer

Sample code to collect Apache Iceberg metrics for table monitoring

Language: Python - Size: 787 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 26 - Forks: 4

amzn/rheoceros

Cloud-based AI / ML workflow and data application development framework

Language: Python - Size: 2.49 MB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 17 - Forks: 9

Wolf-nord/Customer-Churn-Data-Analytics-ETL-Pipeline-by-Airflow-on-EC2

This is an end-to-end AWS Cloud ETL project. This orchestration uses Apache Airflow on AWS EC2 as well as AWS Glue. It demonstrates how to build ETL pipeline that would perform data transform using Glue job/crawler as well as loading into a Redshift table. It also shows how to connect Amazon Athena to Glue Data Catalog, and Power BI to Redshift.

Language: Python - Size: 7.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ricardolsmendes/aws-glue-ci-cd-blueprint

Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post

Language: HCL - Size: 518 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 12 - Forks: 3

jibbs1703/Tickit-Data-Pipeline

This repository demonstrates the creation of a robust data pipeline using an Orchestrator, on-prem and cloud resources. It collects data from on-premises SQL and NoSQL database and loads it into a SQL database in the cloud.

Language: Python - Size: 50.8 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

awslabs/athena-glue-service-logs 📦

Glue scripts for converting AWS Service Logs for use in Athena

Language: Python - Size: 381 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 141 - Forks: 46

vitalibo/glue-pyspark-skeleton

AWS Glue PySpark project skeleton

Language: Python - Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

spe-uob/2020-HealthcareLakeETL

FHIR to OMOP using PySpark on AWS Glue

Language: Python - Size: 1.65 MB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 4

enchant3dmango/esdiel

Esdiel (SDL) stands for serverless data lake. In this project, I'm learning to deploy a simple serverless data lake on AWS using Terraform.

Language: HCL - Size: 544 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

hq969/Youtube-Data-Pipeline-AWS

About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

Language: Python - Size: 1.69 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

Tejesvani/End-to-End-Smart-City-Data-Streaming-Pipeline

The Smart City Data Streaming Pipeline processes real-time data from IoT devices using Apache Kafka for ingestion and Apache Spark for processing. Data is stored in AWS S3 and analyzed with Glue, Athena, and Redshift. It enhances traffic management, predictive analytics, and urban planning, making cities smarter and more efficient.

Language: Python - Size: 14.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

ritesh-ojha/Data-Engineering

End to End Data Engineering Projects

Language: Python - Size: 32.5 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 2

dforsber/glue-table-cache

Query AWS Glue Tables efficiently with DuckDB

Language: TypeScript - Size: 1.14 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

kafbat/ui-serde-glue

AWS Glue Serde for kafka-ui

Language: Java - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 2

aws-samples/bring-your-own-data-labs 📦

Bring your own data Labs: Build a serverless data pipeline based on your own data

Language: HTML - Size: 31.1 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 24

aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

Language: Python - Size: 725 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 2

CloudFay/Sports-Data-Lake

This repository houses the setup_nba_data_lake.py script, which automates the entire process of building a cloud-based data lake for NBA analytics. With this script, you can seamlessly integrate Amazon S3, AWS Glue, and Amazon Athena to store, process, and query NBA-related data—all in a fully scalable and serverless environment!

Language: Python - Size: 7.81 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

patheard/aws-rds-glue-connection

Connect to a private RDS cluster from an AWS Glue job

Language: HCL - Size: 13.7 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

BrianWangila/Sports-Data-Lake-AWS

Automating the building of an NBA Sports Data Lake by leveraging AWS S3, AWS Glue, and AWS Athena and set up an infrastructure to store and query NBA-related data.

Language: Python - Size: 470 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dominique-jacque/NBA-Data-Lake

NBA Data Lake Repository contains the setup_nba_data_lake.py script, which automates the creation of a data lake for NBA analytics using AWS services. The script integrates Amazon S3, AWS Glue, and Amazon Athena, and sets up the infrastructure needed to store and query NBA-related data.

Language: Python - Size: 9.77 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dmschauer/wap-pattern-iceberg-pyspark-aws-glue

About This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.

Language: Jupyter Notebook - Size: 70.3 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BhawnaMehbubani/Ingest-daily-flight-data-in-Redshift-fact-table

End-to-end ETL pipeline for flight data analytics using AWS Glue, Redshift, S3, PySpark, and Athena, with data transformation, enrichment, and reporting capabilities.

Language: Python - Size: 5.41 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

deept-agl/Youtube-data-ETL-Analysis-using-AWS

This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.

Language: Python - Size: 177 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

goamegah/spark-handson

Spark hands-on

Language: Python - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

siconge/Tencent-HQ-BIM-Data-Pipeline-with-AWS

This project delivers an end-to-end data pipeline solution designed to employ a comprehensive ETL process to move BIM data from Autodesk Revit model of Tencent Global Headquarters into cloud storage for processing and and analytics. The pipeline leverages tools and services such as Apache Airflow, Amazon S3, AWS Glue, and Amazon Redshift.

Language: Python - Size: 10.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

minhduc29/leetcode-contest-analytics

A data engineering project to extract, transform, and load LeetCode contest ranking and contest problems data

Language: Python - Size: 5.66 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

kanwal-kingra/SpotifySync-ETL

Extracting Data from Spotify 'Best Hindi Songs' playlist, Transforming Data and Loading Into Snowflake Data Warehouse, using data modeling to make data more accessible

Language: Jupyter Notebook - Size: 148 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

zablon-oigo/nba-data-lake

This project automates the creation of a data lake for NBA analytics using AWS services

Language: Python - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 9.44 MB - Last synced at: 16 days ago - Pushed at: 6 months ago - Stars: 7 - Forks: 7

SWO-GS/athena-cloudtrail-partitioner 📦

Automate the daily partitioning of your CloudTrail bucket in Athena

Language: JavaScript - Size: 671 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 7

vadgamabansari/aws-spotify-insights-data-pipeline

Language: Python - Size: 467 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

cch0/price-transparency-data

Source code for processing insurance price transparency data

Language: Python - Size: 17.6 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ndomah/AWS-YouTube-Data-Analysis

Analyzed YouTube trending video data using AWS services to build a scalable pipeline for data ingestion, ETL, and storage in a centralized data lake. Created QuickSight dashboards highlighting video views by country, category, and region. Workflow included ingestion, preprocessing, cataloging, and analysis.

Language: Python - Size: 968 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dmrhimali/terraform

Tutorial on how to create and run terraform scripts for providers aws and newrelic

Language: HCL - Size: 20.6 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

gautamgc17/YouTube-Data-Analytics-AWS-Pipeline

The projects aims to build a data engineering pipeline on AWS, for analysis of YouTube data based on video categories and trending metrics.

Language: Python - Size: 54.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dmschauer/wap-pattern-pyspark-aws-glue

This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.

Language: Python - Size: 59.6 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

zhiweio/data-engineer-scripts

A curated collection of streamlined and effective scripts and tools designed specifically for data engineering tasks.

Language: Python - Size: 43 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 8 - Forks: 0

webysther/aws-glue-docker 📦

🐋 Docker image for AWS Glue Spark/Python

Language: Dockerfile - Size: 56.6 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 8

ev2900/Glue_Hudi

Apache Hudi examples designed to be run on AWS Glue via. Glue Jobs

Language: Python - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

MateusJordao/automacao-jobs-aws-glue

Automação de tarefas AWS

Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Kirolos00Daniel/AWS-Store-Orders-Analysis

AWS Orders Analysis

Size: 1.88 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

lasyakonduru/superstore-sales-data-analysis

Analysis of sales performance and operational efficiency in a superstore using AWS Athena and QuickSight

Size: 2.25 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

wednesday-solutions/Data-Engineering-Onboarding-Starter

This repository contains a 10 step program to enter the world of Data Engineering

Language: Python - Size: 6.42 MB - Last synced at: 28 days ago - Pushed at: 10 months ago - Stars: 14 - Forks: 1

aruadecarvalho/deftunes-pipeline-aws

An end-to-end data pipeline for De Ftunes’ music purchase analytics, designed to ingest, transform, and model data for efficient analysis of song purchases, user behavior, and service trends. Utilizes AWS Glue, S3, Redshift Spectrum, Apache Airflow, DBT, Superset, and Terraform.

Language: Python - Size: 294 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

aws-samples/aws-glue-jobs-unit-testing

Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects

Language: Python - Size: 402 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 39 - Forks: 22

shahidmalik4/aws-glue-stepfunctions-etl

This project automates an ETL pipeline using AWS Glue, S3, Athena, and Step Functions to transform raw Airbnb data. It cleanses, enriches, and organizes the data into separate raw and transformed databases, enabling efficient querying and analysis via Athena, with automated notifications through SNS.

Language: Python - Size: 3.47 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

wednesday-solutions/aws-glue-jupyter-notebook-starter

A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally

Language: Jupyter Notebook - Size: 43 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

DisasterAWARE/aws-glue-schema-registry-python

Use the AWS Glue Schema Registry in Python projects.

Language: Python - Size: 61.5 KB - Last synced at: 27 days ago - Pushed at: 6 months ago - Stars: 32 - Forks: 15

moritzkoerber/covid-19-data-engineering-pipeline

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

Language: Python - Size: 1.31 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 5

monisha-anila/Data-Analyst-hacks

A beginner guide to do your best with data!

Language: Jupyter Notebook - Size: 137 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

aryan4codes/StockIO

StockIO is a real-time data streaming solution designed to process and analyze stock market data using Apache Kafka and AWS services.

Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

evanmathew/Reddit_ETL_DE

This project demonstrates a complete data pipeline for extracting, transforming, and loading (ETL) Reddit data into an Amazon Redshift data warehouse. The pipeline uses various AWS services and tools including Apache Airflow, PostgreSQL, AWS S3, AWS Glue, AWS Athena, and Amazon Redshift. The project is orchestrated using Docker and Apache Airflow

Language: Python - Size: 137 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

j3-signalroom/apache_flink-kickstarter

Examples of Apache Flink® applications showcasing the DataStream API and Table API in Java and Python, featuring AWS, GitHub, Terraform, and Apache Iceberg.

Language: Java - Size: 17.9 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

1oglop1/aws-glue-monorepo-style

Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.

Language: Python - Size: 488 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 10

imsanjoykb/AWSBootcamp

AWS Bootcamp | Resource | Document | Materials |

Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 2

aws-samples/data-lake-as-code

Data Lake as Code, featuring ChEMBL and OpenTargets

Language: TypeScript - Size: 1.26 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 166 - Forks: 44

AWS-Big-Data-Projects/front-line-concussion-monitoring-system-using-AWS-IoT-and-serverless-data-lakes

A simple, practical, and affordable system for measuring head trauma within the sports environment, subject to the absence of trained medical personnel made using Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda

Language: Shell - Size: 30.3 KB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 0

desininja/Airline-Data-Ingestion-Pipeline

ETL pipeline using AWS services.

Language: Python - Size: 4.33 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

desininja/Quality-Movie-Data-Pipeline

ETL pipeline using AWS services

Language: Python - Size: 727 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-delta-lake

Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 314 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 8 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-apache-iceberg

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 465 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 17 - Forks: 2

StatAziz/Ames-Weather-Data-ETL-Pipeline

This project is about building a serverless ETL pipeline using open-meteo weather API.

Language: Jupyter Notebook - Size: 24.2 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Tyriek-cloud/NYC-Mobility-Survey-Analysis

An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.

Language: Python - Size: 2.75 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SeaBebop/TekkenSubreddit-ETL-Pipeline

AWS Glue ETL transformation of tekken subreddit data

Language: Python - Size: 56.6 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

aws-samples/amazon-deequ-glue

Automated data quality suggestions and analysis with Deequ on AWS Glue

Language: Scala - Size: 2.1 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 83 - Forks: 23

jhole89/aws-glue-sbt-quickstart

Example of how to set SBT up for local development of AWS Glue Scripts

Language: Scala - Size: 30.3 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

TimKong21/AWS-Batch-Processing

Big data analysis with AWS services, filtering the Wikiticker dataset with Apache Spark on Amazon EMR, storing data in S3, cataloging with AWS Glue, and querying with Amazon Athena. This end-to-end pipeline exemplifies handling and analyzing big data in the cloud.

Language: Python - Size: 8.01 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

rashmishreev/atm-analytics-bigaata-aws

Analyze over 2.5 million ATM transaction records from Spar Nord Bank to optimize ATM usage patterns and enhance customer service using AWS Services and Big Data Analytics.

Language: Jupyter Notebook - Size: 42.4 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

DenysGonzaga/glue-athena-cdk-example

A small walkthrough how to create an AWS Glue Job Pipeline with AWS CDK

Language: Python - Size: 10.7 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rashmishreev/Skillshift-Trend-Analysis

SkillShift uses advanced DBMS to analyze job listings, providing insights into evolving skill requirements across industries. It offers detailed analysis on skill demand, workplace culture, and industry trends, empowering professionals to make informed decisions about career development in a dynamic job market.

Language: Jupyter Notebook - Size: 33.4 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

pgrarchives/AWS_DATA_PIPELINE

End to End Data Engineering Pipeline using AWS Cloud Services

Language: Jupyter Notebook - Size: 2.03 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

aws-samples/cloud-experiments 📦

Open innovation with 60 minute cloud experiments on AWS

Language: Jupyter Notebook - Size: 22.8 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 88 - Forks: 56

flemm0/capitol-trades

politician stock market activity web scraping project

Language: Python - Size: 2.26 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

DivineSamOfficial/Banking-Data-Warehouse-Pipeline

Banking Data Warehouse Pipeline

Language: Python - Size: 52.1 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

taupirho/read-big-file-aws-athena-glue

Continuing with my case study on reading a big data file, this is the fifth part of my trilogy :-) on how I got on reading a big'ish file with C, Python, spark-python and spark-scala, AWS Elastic Map reduce and AWS Athena.

Language: Python - Size: 45.9 KB - Last synced at: 11 months ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1