GitHub topics: aws-glue | Ecosyste.ms: Repos

streamthoughts/jikkou

The Open source Resource as Code framework for Apache Kafka. Jikkou helps you implement GitOps for Kafka at scale!

Language: Java - Size: 32.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 234 - Forks: 19

gps31320779/insightflow-retail-economic-pipeline

A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.

Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

saifuzzuhdi123/apache_kafka_stock_market_data_streaming

This repository provides a clear guide on using Apache Kafka for real-time stock market data streaming. 📈 Explore how to set up producers and consumers, and see practical applications in financial data processing. 🛠️

Language: Jupyter Notebook - Size: 2.46 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

pizofreude/insightflow-retail-economic-pipeline

A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.

Language: HCL - Size: 2.38 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

ev2900/Iceberg_update_metadata_script

Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)

Language: Python - Size: 694 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 0

ev2900/Iceberg_Glue_register_table

Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog

Language: Python - Size: 565 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 2

aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Language: Python - Size: 15.9 MB - Last synced at: 2 days ago - Pushed at: 6 days ago - Stars: 4,030 - Forks: 707

ev2900/MongoDB_Streams_Glue_Iceberg

Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date

Language: Python - Size: 27.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

ev2900/Glue_Examples

PySpark code samples designed for AWS Glue

Language: Python - Size: 57.6 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

lanafrenzel/aws-etl-pipeline

ETL pipeline on AWS with Lambda, Glue, and S3 for data ingestion and processing - in progress

Language: Python - Size: 82 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

schubergphilis/terraform-aws-mcaf-glue-job

A Terraform module that creates a Glue job

Language: HCL - Size: 54.7 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

shmokmt/awscrondoc

List up cron expressions registered in Amazon Web Services.

Language: Go - Size: 103 KB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

umbertocicciaa/aws-mlops

Another Aws MlOps pipeline with Glue and Sagemaker, automated with Terraform and CICD

Language: HCL - Size: 1.88 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

data-dot-all/dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

Language: Python - Size: 97 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 242 - Forks: 82

dashmug/glue-utils

Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs by reducing boilerplate code, increasing type safety, and improving IDE auto-completion.

Language: Python - Size: 830 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 8 - Forks: 2

averemee-si/ora2iceberg

Transfer data from Oracle database tables, views, and query results to Apache Iceberg tables

Language: Java - Size: 412 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 2 - Forks: 1

akhilpatlolla/Generic_ETL_Utility_AWS_GLUE

AWS Glue - Incremental Pull Script

Language: Python - Size: 351 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

aws-samples/amazon-rds-export-to-s3-automation

This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3

Size: 235 KB - Last synced at: 19 days ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 2

aws-samples/aws-glue-jobs-unit-testing

Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects

Language: Python - Size: 404 KB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 47 - Forks: 22

priyanshubiswas-tech/AWS-ETL-Pipeline-on-Cloud-using-Glue-Athena-Lambda-and-Redshift

Serverless ETL pipeline on AWS using Glue, Lambda, Athena, and Redshift — automates data ingestion, transformation, and analytics with scalable, event-driven architecture.

Language: Python - Size: 20.5 KB - Last synced at: 16 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

DivitMittal/Datathon-BigData

Efficient Data Processing ETL Pipeline for Event Records

Language: Jupyter Notebook - Size: 4.1 MB - Last synced at: 20 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

lindsaygelle/AWSComprehend

AWS Comprehend is an event-driven, serverless data processing pipeline that leverages AWS services to perform natural language processing and analysis on user-submitted text files.

Language: HCL - Size: 1.73 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

cloudposse/terraform-aws-glue

Terraform modules for provisioning and managing AWS Glue resources

Language: HCL - Size: 3.93 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 31 - Forks: 34

ShreyasShende3/reddit-data-engineering

Built a ETL pipeline using Airflow and then used various AWS tools for further processing, storage and visualization like S3, Glue, Athena and Redshift

Language: Python - Size: 119 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

Subhajit-Chowdhury/RAW-Spotify-Data-into-Insights-with-AWS

Unlocking Spotify insights with an AWS data pipeline: S3 data lake -> Glue ETL-> Athena queries -> QuickSight Dashboard

Language: Python - Size: 253 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

aws-samples/analyzing-reddit-sentiment-with-aws

Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.

Language: Python - Size: 3.48 MB - Last synced at: 19 days ago - Pushed at: about 4 years ago - Stars: 44 - Forks: 15

aws-samples/streamlit-application-deployment-on-aws

Streamlit EDA Dashboard Powered by AWS Cloud

Language: Python - Size: 3.99 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 82 - Forks: 33

quixoticmonk/terraform-aws-glue

Terraform module for AWS Glue related infrastructure

Language: HCL - Size: 147 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

hackolade/glue

Hackolade(https://hackolade.com) plugin for AWS Glue Data Catalog

Language: JavaScript - Size: 22.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 8

tokern/piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

Language: Python - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 311 - Forks: 99

sjlewis25/pizza-delivery-pipeline

Simulates a real-world data pipeline for a pizza delivery service using AWS services and Terraform. Ingests and processes delivery data with S3, triggers Lambda functions for processing, and stores structured data in DynamoDB. Highlights use of automation, event-driven triggers, and real-time cloud-based data workflows.

Language: HCL - Size: 30.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 6.88 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 63 - Forks: 38

aws-samples/cloud-experiments 📦

Open innovation with 60 minute cloud experiments on AWS

Language: Jupyter Notebook - Size: 22.8 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 88 - Forks: 56

aws-samples/amazon-deequ-glue

Automated data quality suggestions and analysis with Deequ on AWS Glue

Language: Scala - Size: 2.1 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 85 - Forks: 24

aws-samples/monitoring-apache-iceberg-table-metadata-layer

Sample code to collect Apache Iceberg metrics for table monitoring

Language: Python - Size: 787 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 4

akshay6991/data-engineer1

End to End Data Engineering Projects

Language: Python - Size: 1.25 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

vsingh55/NBA-Analytics-Data-Lake

A sports analytics data lake leveraging AWS S3 for storage, AWS Glue for data cataloging, and AWS Athena for querying. Python scripts are used for data ingestion and manages the infrastructure.

Language: Python - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-delta-lake

Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 314 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 9 - Forks: 0

agnostk/nasa-space-activity

A scalable, cloud-native ETL pipeline that extracts, transforms, and enriches data from NASA APIs using AWS Glue, Lightsail, and RDS — all orchestrated with Terraform. Features modular design, medallion architecture (bronze, silver, gold), image metadata extraction and classification with PyTorch, and a bonus Mosaic Generator app.

Language: Python - Size: 2.52 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

Language: Python - Size: 727 KB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 32 - Forks: 2

tokern/lakecli

A CLI to manage and monitor permissions in AWS Lake Formation

Language: Python - Size: 729 KB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 8

Danitilahun/Reddit-Data-Engineering

This project automates the extraction, transformation, and loading (ETL) of Reddit data into a Redshift data warehouse using Airflow. Key technologies include Celery, PostgreSQL, S3, Glue, Athena, and Redshift, providing a complete data pipeline solution.

Size: 119 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

aws-samples/data-lake-as-code

Data Lake as Code, featuring ChEMBL and OpenTargets

Language: TypeScript - Size: 1.26 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 170 - Forks: 45

dgomesbr/awesome-aws-workshops

(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.

Language: HTML - Size: 1.49 MB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 411 - Forks: 113

vaxdata22/Customer-Churn-Data-Analytics-ETL-Pipeline-by-Airflow-on-EC2

This is an end-to-end AWS Cloud ETL project. This orchestration uses Apache Airflow on AWS EC2 as well as AWS Glue. It demonstrates how to build ETL pipeline that would perform data transform using Glue job/crawler as well as loading into a Redshift table. It also shows how to connect Amazon Athena to Glue Data Catalog, and Power BI to Redshift.

Language: Python - Size: 7.2 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

awslabs/amazon-athena-cross-account-catalog 📦

🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena

Language: Python - Size: 150 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 30 - Forks: 19

amzn/rheoceros

Cloud-based AI / ML workflow and data application development framework

Language: Python - Size: 2.49 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 17 - Forks: 9

Wolf-nord/Customer-Churn-Data-Analytics-ETL-Pipeline-by-Airflow-on-EC2

This is an end-to-end AWS Cloud ETL project. This orchestration uses Apache Airflow on AWS EC2 as well as AWS Glue. It demonstrates how to build ETL pipeline that would perform data transform using Glue job/crawler as well as loading into a Redshift table. It also shows how to connect Amazon Athena to Glue Data Catalog, and Power BI to Redshift.

Language: Python - Size: 7.17 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ricardolsmendes/aws-glue-ci-cd-blueprint

Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post

Language: HCL - Size: 518 KB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 12 - Forks: 3

jibbs1703/Tickit-Data-Pipeline

This repository demonstrates the creation of a robust data pipeline using an Orchestrator, on-prem and cloud resources. It collects data from on-premises SQL and NoSQL database and loads it into a SQL database in the cloud.

Language: Python - Size: 54.7 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

awslabs/athena-glue-service-logs 📦

Glue scripts for converting AWS Service Logs for use in Athena

Language: Python - Size: 381 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 141 - Forks: 46

aws-samples/aws-glue-streaming-etl-with-apache-iceberg

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 465 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 23 - Forks: 2

vitalibo/glue-pyspark-skeleton

AWS Glue PySpark project skeleton

Language: Python - Size: 90.8 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1 - Forks: 1

spe-uob/2020-HealthcareLakeETL

FHIR to OMOP using PySpark on AWS Glue

Language: Python - Size: 1.65 MB - Last synced at: about 15 hours ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 4

enchant3dmango/esdiel

Esdiel (SDL) stands for serverless data lake. In this project, I'm learning to deploy a simple serverless data lake on AWS using Terraform.

Language: HCL - Size: 544 KB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

hq969/Youtube-Data-Pipeline-AWS

About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

Language: Python - Size: 1.69 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

Tejesvani/IoT-Data-Streaming-and-Analytics

The Smart City Data Streaming Pipeline processes real-time data from IoT devices using Apache Kafka for ingestion and Apache Spark for processing. Data is stored in AWS S3 and analyzed with Glue, Athena, and Redshift. It enhances traffic management, predictive analytics, and urban planning, making cities smarter and more efficient.

Language: Python - Size: 18.6 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ritesh-ojha/Data-Engineering

End to End Data Engineering Projects

Language: Python - Size: 32.5 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 2

dforsber/glue-table-cache

Query AWS Glue Tables efficiently with DuckDB

Language: TypeScript - Size: 1.14 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

kafbat/ui-serde-glue

AWS Glue Serde for kafka-ui

Language: Java - Size: 64.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 2

aws-samples/bring-your-own-data-labs 📦

Bring your own data Labs: Build a serverless data pipeline based on your own data

Language: HTML - Size: 31.1 MB - Last synced at: 19 days ago - Pushed at: about 2 years ago - Stars: 44 - Forks: 24

CloudFay/Sports-Data-Lake

This repository houses the setup_nba_data_lake.py script, which automates the entire process of building a cloud-based data lake for NBA analytics. With this script, you can seamlessly integrate Amazon S3, AWS Glue, and Amazon Athena to store, process, and query NBA-related data—all in a fully scalable and serverless environment!

Language: Python - Size: 7.81 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

patheard/aws-rds-glue-connection

Connect to a private RDS cluster from an AWS Glue job

Language: HCL - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

BrianWangila/Sports-Data-Lake-AWS

Automating the building of an NBA Sports Data Lake by leveraging AWS S3, AWS Glue, and AWS Athena and set up an infrastructure to store and query NBA-related data.

Language: Python - Size: 470 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

dominique-jacque/NBA-Data-Lake

NBA Data Lake Repository contains the setup_nba_data_lake.py script, which automates the creation of a data lake for NBA analytics using AWS services. The script integrates Amazon S3, AWS Glue, and Amazon Athena, and sets up the infrastructure needed to store and query NBA-related data.

Language: Python - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

dmschauer/wap-pattern-iceberg-pyspark-aws-glue

About This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.

Language: Jupyter Notebook - Size: 70.3 KB - Last synced at: 23 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

BhawnaMehbubani/Ingest-daily-flight-data-in-Redshift-fact-table

End-to-end ETL pipeline for flight data analytics using AWS Glue, Redshift, S3, PySpark, and Athena, with data transformation, enrichment, and reporting capabilities.

Language: Python - Size: 5.41 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

deept-agl/Youtube-data-ETL-Analysis-using-AWS

This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.

Language: Python - Size: 177 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

goamegah/spark-handson

Spark hands-on

Language: Python - Size: 3.08 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

siconge/Tencent-HQ-BIM-Data-Pipeline-with-AWS

This project delivers an end-to-end data pipeline solution designed to employ a comprehensive ETL process to move BIM data from Autodesk Revit model of Tencent Global Headquarters into cloud storage for processing and and analytics. The pipeline leverages tools and services such as Apache Airflow, Amazon S3, AWS Glue, and Amazon Redshift.

Language: Python - Size: 10.2 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

minhduc29/leetcode-contest-analytics

A data engineering project to extract, transform, and load LeetCode contest ranking and contest problems data

Language: Python - Size: 5.66 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

kanwal-kingra/SpotifySync-ETL

Extracting Data from Spotify 'Best Hindi Songs' playlist, Transforming Data and Loading Into Snowflake Data Warehouse, using data modeling to make data more accessible

Language: Jupyter Notebook - Size: 148 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

zablon-oigo/nba-data-lake

This project automates the creation of a data lake for NBA analytics using AWS services

Language: Python - Size: 12.7 KB - Last synced at: 20 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 9.44 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 7 - Forks: 7

SWO-GS/athena-cloudtrail-partitioner 📦

Automate the daily partitioning of your CloudTrail bucket in Athena

Language: JavaScript - Size: 671 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 7

vadgamabansari/aws-spotify-insights-data-pipeline

Language: Python - Size: 467 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

cch0/price-transparency-data

Source code for processing insurance price transparency data

Language: Python - Size: 17.6 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

ndomah/AWS-YouTube-Data-Analysis

Analyzed YouTube trending video data using AWS services to build a scalable pipeline for data ingestion, ETL, and storage in a centralized data lake. Created QuickSight dashboards highlighting video views by country, category, and region. Workflow included ingestion, preprocessing, cataloging, and analysis.

Language: Python - Size: 968 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

dmrhimali/terraform

Tutorial on how to create and run terraform scripts for providers aws and newrelic

Language: HCL - Size: 20.6 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

gautamgc17/YouTube-Data-Analytics-AWS-Pipeline

The projects aims to build a data engineering pipeline on AWS, for analysis of YouTube data based on video categories and trending metrics.

Language: Python - Size: 54.7 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

dmschauer/wap-pattern-pyspark-aws-glue

This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.

Language: Python - Size: 59.6 KB - Last synced at: 23 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

zhiweio/data-engineer-scripts

A curated collection of streamlined and effective scripts and tools designed specifically for data engineering tasks.

Language: Python - Size: 43 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

webysther/aws-glue-docker 📦

🐋 Docker image for AWS Glue Spark/Python

Language: Dockerfile - Size: 56.6 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 8

aws-samples/aws-security-hub-glue-aggregator-terraform

These Terraform modules aggregate Security Hub findings to centralized account using Amazon Kinesis Firehose and AWS Glue

Language: HCL - Size: 146 KB - Last synced at: 19 days ago - Pushed at: almost 3 years ago - Stars: 9 - Forks: 5

ev2900/Glue_Hudi

Apache Hudi examples designed to be run on AWS Glue via. Glue Jobs

Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 1

MateusJordao/automacao-jobs-aws-glue

Automação de tarefas AWS

Language: Python - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Kirolos00Daniel/AWS-Store-Orders-Analysis

AWS Orders Analysis

Size: 1.88 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

lasyakonduru/superstore-sales-data-analysis

Analysis of sales performance and operational efficiency in a superstore using AWS Athena and QuickSight

Size: 2.25 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

wednesday-solutions/Data-Engineering-Onboarding-Starter

This repository contains a 10 step program to enter the world of Data Engineering

Language: Python - Size: 6.42 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 14 - Forks: 1

aruadecarvalho/deftunes-pipeline-aws

An end-to-end data pipeline for De Ftunes’ music purchase analytics, designed to ingest, transform, and model data for efficient analysis of song purchases, user behavior, and service trends. Utilizes AWS Glue, S3, Redshift Spectrum, Apache Airflow, DBT, Superset, and Terraform.

Language: Python - Size: 294 KB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

shahidmalik4/aws-glue-stepfunctions-etl

This project automates an ETL pipeline using AWS Glue, S3, Athena, and Step Functions to transform raw Airbnb data. It cleanses, enriches, and organizes the data into separate raw and transformed databases, enabling efficient querying and analysis via Athena, with automated notifications through SNS.

Language: Python - Size: 3.47 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

wednesday-solutions/aws-glue-jupyter-notebook-starter

A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally

Language: Jupyter Notebook - Size: 43 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 1

aws-samples/aws-glue-crawler-utilities

This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.

Language: Python - Size: 107 KB - Last synced at: 19 days ago - Pushed at: over 3 years ago - Stars: 19 - Forks: 11

DisasterAWARE/aws-glue-schema-registry-python

Use the AWS Glue Schema Registry in Python projects.

Language: Python - Size: 61.5 KB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 32 - Forks: 16

moritzkoerber/covid-19-data-engineering-pipeline

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

Language: Python - Size: 1.31 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 5

monisha-anila/Data-Analyst-hacks

A beginner guide to do your best with data!

Language: Jupyter Notebook - Size: 137 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

aryan4codes/StockIO

StockIO is a real-time data streaming solution designed to process and analyze stock market data using Apache Kafka and AWS services.

Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

evanmathew/Reddit_ETL_DE

This project demonstrates a complete data pipeline for extracting, transforming, and loading (ETL) Reddit data into an Amazon Redshift data warehouse. The pipeline uses various AWS services and tools including Apache Airflow, PostgreSQL, AWS S3, AWS Glue, AWS Athena, and Amazon Redshift. The project is orchestrated using Docker and Apache Airflow

Language: Python - Size: 137 KB - Last synced at: 23 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

1oglop1/aws-glue-monorepo-style

Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.

Language: Python - Size: 488 KB - Last synced at: 7 months ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 10

imsanjoykb/AWSBootcamp

AWS Bootcamp | Resource | Document | Materials |

Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 2