GitHub topics: aws-glue
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Language: Python - Size: 16.1 MB - Last synced at: about 1 hour ago - Pushed at: 5 days ago - Stars: 4,016 - Forks: 706

gps31320779/insightflow-retail-economic-pipeline
A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.
Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

ev2900/Iceberg_update_metadata_script
Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)
Language: Python - Size: 735 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

ev2900/Iceberg_Glue_register_table
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
Language: Python - Size: 549 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 1

ev2900/Glue_Examples
PySpark code samples designed for AWS Glue
Language: Python - Size: 51.8 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

schubergphilis/terraform-aws-mcaf-glue-job
A Terraform module that creates a Glue job
Language: HCL - Size: 43.9 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

data-dot-all/dataall
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Language: Python - Size: 97.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 242 - Forks: 82

akshay6991/data-engineer1
End to End Data Engineering Projects
Language: Python - Size: 1.25 MB - Last synced at: about 12 hours ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

shmokmt/awscrondoc
List up cron expressions registered in Amazon Web Services.
Language: Go - Size: 84 KB - Last synced at: 2 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

lindsaygelle/AWSComprehend
AWS Comprehend is an event-driven, serverless data processing pipeline that leverages AWS services to perform natural language processing and analysis on user-submitted text files.
Language: HCL - Size: 1.74 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

vsingh55/NBA-Analytics-Data-Lake
A sports analytics data lake leveraging AWS S3 for storage, AWS Glue for data cataloging, and AWS Athena for querying. Python scripts are used for data ingestion and manages the infrastructure.
Language: Python - Size: 2.93 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

pizofreude/insightflow-retail-economic-pipeline
A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.
Language: HCL - Size: 672 KB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

cloudposse/terraform-aws-glue
Terraform modules for provisioning and managing AWS Glue resources
Language: HCL - Size: 3.93 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 30 - Forks: 33

ev2900/MongoDB_Streams_Glue_Iceberg
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
Language: Python - Size: 27.3 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

agnostk/nasa-space-activity
A scalable, cloud-native ETL pipeline that extracts, transforms, and enriches data from NASA APIs using AWS Glue, Lightsail, and RDS — all orchestrated with Terraform. Features modular design, medallion architecture (bronze, silver, gold), image metadata extraction and classification with PyTorch, and a bonus Mosaic Generator app.
Language: Python - Size: 2.52 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

hackolade/glue
Hackolade(https://hackolade.com) plugin for AWS Glue Data Catalog
Language: JavaScript - Size: 22.9 MB - Last synced at: 11 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 8

tokern/lakecli
A CLI to manage and monitor permissions in AWS Lake Formation
Language: Python - Size: 729 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 8

dgomesbr/awesome-aws-workshops
(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.
Language: HTML - Size: 1.49 MB - Last synced at: about 19 hours ago - Pushed at: over 3 years ago - Stars: 411 - Forks: 113

sjlewis25/pizza-delivery-pipeline
Simulates a real-world data pipeline for a pizza delivery service using AWS services and Terraform. Ingests and processes delivery data with S3, triggers Lambda functions for processing, and stores structured data in DynamoDB. Highlights use of automation, event-driven triggers, and real-time cloud-based data workflows.
Language: HCL - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

dashmug/glue-utils
Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs by reducing boilerplate code, increasing type safety, and improving IDE auto-completion.
Language: Python - Size: 697 KB - Last synced at: 23 days ago - Pushed at: 25 days ago - Stars: 6 - Forks: 2

vaxdata22/Customer-Churn-Data-Analytics-ETL-Pipeline-by-Airflow-on-EC2
This is an end-to-end AWS Cloud ETL project. This orchestration uses Apache Airflow on AWS EC2 as well as AWS Glue. It demonstrates how to build ETL pipeline that would perform data transform using Glue job/crawler as well as loading into a Redshift table. It also shows how to connect Amazon Athena to Glue Data Catalog, and Power BI to Redshift.
Language: Python - Size: 7.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

tokern/piicatcher
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
Language: Python - Size: 1.38 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 306 - Forks: 99

awslabs/amazon-athena-cross-account-catalog 📦
🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
Language: Python - Size: 150 KB - Last synced at: 2 days ago - Pushed at: almost 3 years ago - Stars: 30 - Forks: 19

averemee-si/ora2iceberg
Transfer data from Oracle database tables, views, and query results to Apache Iceberg tables
Language: Java - Size: 392 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Size: 6.88 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 59 - Forks: 38

aws-samples/monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
Language: Python - Size: 787 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 26 - Forks: 4

amzn/rheoceros
Cloud-based AI / ML workflow and data application development framework
Language: Python - Size: 2.49 MB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 17 - Forks: 9

Wolf-nord/Customer-Churn-Data-Analytics-ETL-Pipeline-by-Airflow-on-EC2
This is an end-to-end AWS Cloud ETL project. This orchestration uses Apache Airflow on AWS EC2 as well as AWS Glue. It demonstrates how to build ETL pipeline that would perform data transform using Glue job/crawler as well as loading into a Redshift table. It also shows how to connect Amazon Athena to Glue Data Catalog, and Power BI to Redshift.
Language: Python - Size: 7.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ricardolsmendes/aws-glue-ci-cd-blueprint
Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post
Language: HCL - Size: 518 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 12 - Forks: 3

jibbs1703/Tickit-Data-Pipeline
This repository demonstrates the creation of a robust data pipeline using an Orchestrator, on-prem and cloud resources. It collects data from on-premises SQL and NoSQL database and loads it into a SQL database in the cloud.
Language: Python - Size: 50.8 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

awslabs/athena-glue-service-logs 📦
Glue scripts for converting AWS Service Logs for use in Athena
Language: Python - Size: 381 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 141 - Forks: 46

vitalibo/glue-pyspark-skeleton
AWS Glue PySpark project skeleton
Language: Python - Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

spe-uob/2020-HealthcareLakeETL
FHIR to OMOP using PySpark on AWS Glue
Language: Python - Size: 1.65 MB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 4

enchant3dmango/esdiel
Esdiel (SDL) stands for serverless data lake. In this project, I'm learning to deploy a simple serverless data lake on AWS using Terraform.
Language: HCL - Size: 544 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

hq969/Youtube-Data-Pipeline-AWS
About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
Language: Python - Size: 1.69 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

Tejesvani/End-to-End-Smart-City-Data-Streaming-Pipeline
The Smart City Data Streaming Pipeline processes real-time data from IoT devices using Apache Kafka for ingestion and Apache Spark for processing. Data is stored in AWS S3 and analyzed with Glue, Athena, and Redshift. It enhances traffic management, predictive analytics, and urban planning, making cities smarter and more efficient.
Language: Python - Size: 14.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

ritesh-ojha/Data-Engineering
End to End Data Engineering Projects
Language: Python - Size: 32.5 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 2

dforsber/glue-table-cache
Query AWS Glue Tables efficiently with DuckDB
Language: TypeScript - Size: 1.14 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

kafbat/ui-serde-glue
AWS Glue Serde for kafka-ui
Language: Java - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 2

aws-samples/bring-your-own-data-labs 📦
Bring your own data Labs: Build a serverless data pipeline based on your own data
Language: HTML - Size: 31.1 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 24

aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Language: Python - Size: 725 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 2

CloudFay/Sports-Data-Lake
This repository houses the setup_nba_data_lake.py script, which automates the entire process of building a cloud-based data lake for NBA analytics. With this script, you can seamlessly integrate Amazon S3, AWS Glue, and Amazon Athena to store, process, and query NBA-related data—all in a fully scalable and serverless environment!
Language: Python - Size: 7.81 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

patheard/aws-rds-glue-connection
Connect to a private RDS cluster from an AWS Glue job
Language: HCL - Size: 13.7 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

BrianWangila/Sports-Data-Lake-AWS
Automating the building of an NBA Sports Data Lake by leveraging AWS S3, AWS Glue, and AWS Athena and set up an infrastructure to store and query NBA-related data.
Language: Python - Size: 470 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dominique-jacque/NBA-Data-Lake
NBA Data Lake Repository contains the setup_nba_data_lake.py script, which automates the creation of a data lake for NBA analytics using AWS services. The script integrates Amazon S3, AWS Glue, and Amazon Athena, and sets up the infrastructure needed to store and query NBA-related data.
Language: Python - Size: 9.77 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dmschauer/wap-pattern-iceberg-pyspark-aws-glue
About This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.
Language: Jupyter Notebook - Size: 70.3 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BhawnaMehbubani/Ingest-daily-flight-data-in-Redshift-fact-table
End-to-end ETL pipeline for flight data analytics using AWS Glue, Redshift, S3, PySpark, and Athena, with data transformation, enrichment, and reporting capabilities.
Language: Python - Size: 5.41 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

deept-agl/Youtube-data-ETL-Analysis-using-AWS
This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.
Language: Python - Size: 177 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

goamegah/spark-handson
Spark hands-on
Language: Python - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

siconge/Tencent-HQ-BIM-Data-Pipeline-with-AWS
This project delivers an end-to-end data pipeline solution designed to employ a comprehensive ETL process to move BIM data from Autodesk Revit model of Tencent Global Headquarters into cloud storage for processing and and analytics. The pipeline leverages tools and services such as Apache Airflow, Amazon S3, AWS Glue, and Amazon Redshift.
Language: Python - Size: 10.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

minhduc29/leetcode-contest-analytics
A data engineering project to extract, transform, and load LeetCode contest ranking and contest problems data
Language: Python - Size: 5.66 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

kanwal-kingra/SpotifySync-ETL
Extracting Data from Spotify 'Best Hindi Songs' playlist, Transforming Data and Loading Into Snowflake Data Warehouse, using data modeling to make data more accessible
Language: Jupyter Notebook - Size: 148 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

zablon-oigo/nba-data-lake
This project automates the creation of a data lake for NBA analytics using AWS services
Language: Python - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Size: 9.44 MB - Last synced at: 16 days ago - Pushed at: 6 months ago - Stars: 7 - Forks: 7

SWO-GS/athena-cloudtrail-partitioner 📦
Automate the daily partitioning of your CloudTrail bucket in Athena
Language: JavaScript - Size: 671 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 7

vadgamabansari/aws-spotify-insights-data-pipeline
Language: Python - Size: 467 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

cch0/price-transparency-data
Source code for processing insurance price transparency data
Language: Python - Size: 17.6 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ndomah/AWS-YouTube-Data-Analysis
Analyzed YouTube trending video data using AWS services to build a scalable pipeline for data ingestion, ETL, and storage in a centralized data lake. Created QuickSight dashboards highlighting video views by country, category, and region. Workflow included ingestion, preprocessing, cataloging, and analysis.
Language: Python - Size: 968 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dmrhimali/terraform
Tutorial on how to create and run terraform scripts for providers aws and newrelic
Language: HCL - Size: 20.6 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

gautamgc17/YouTube-Data-Analytics-AWS-Pipeline
The projects aims to build a data engineering pipeline on AWS, for analysis of YouTube data based on video categories and trending metrics.
Language: Python - Size: 54.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dmschauer/wap-pattern-pyspark-aws-glue
This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.
Language: Python - Size: 59.6 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

zhiweio/data-engineer-scripts
A curated collection of streamlined and effective scripts and tools designed specifically for data engineering tasks.
Language: Python - Size: 43 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 8 - Forks: 0

webysther/aws-glue-docker 📦
🐋 Docker image for AWS Glue Spark/Python
Language: Dockerfile - Size: 56.6 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 8

ev2900/Glue_Hudi
Apache Hudi examples designed to be run on AWS Glue via. Glue Jobs
Language: Python - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

MateusJordao/automacao-jobs-aws-glue
Automação de tarefas AWS
Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Kirolos00Daniel/AWS-Store-Orders-Analysis
AWS Orders Analysis
Size: 1.88 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

lasyakonduru/superstore-sales-data-analysis
Analysis of sales performance and operational efficiency in a superstore using AWS Athena and QuickSight
Size: 2.25 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

wednesday-solutions/Data-Engineering-Onboarding-Starter
This repository contains a 10 step program to enter the world of Data Engineering
Language: Python - Size: 6.42 MB - Last synced at: 28 days ago - Pushed at: 10 months ago - Stars: 14 - Forks: 1

aruadecarvalho/deftunes-pipeline-aws
An end-to-end data pipeline for De Ftunes’ music purchase analytics, designed to ingest, transform, and model data for efficient analysis of song purchases, user behavior, and service trends. Utilizes AWS Glue, S3, Redshift Spectrum, Apache Airflow, DBT, Superset, and Terraform.
Language: Python - Size: 294 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

aws-samples/aws-glue-jobs-unit-testing
Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects
Language: Python - Size: 402 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 39 - Forks: 22

shahidmalik4/aws-glue-stepfunctions-etl
This project automates an ETL pipeline using AWS Glue, S3, Athena, and Step Functions to transform raw Airbnb data. It cleanses, enriches, and organizes the data into separate raw and transformed databases, enabling efficient querying and analysis via Athena, with automated notifications through SNS.
Language: Python - Size: 3.47 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

wednesday-solutions/aws-glue-jupyter-notebook-starter
A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally
Language: Jupyter Notebook - Size: 43 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

DisasterAWARE/aws-glue-schema-registry-python
Use the AWS Glue Schema Registry in Python projects.
Language: Python - Size: 61.5 KB - Last synced at: 27 days ago - Pushed at: 6 months ago - Stars: 32 - Forks: 15

moritzkoerber/covid-19-data-engineering-pipeline
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Language: Python - Size: 1.31 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 5

monisha-anila/Data-Analyst-hacks
A beginner guide to do your best with data!
Language: Jupyter Notebook - Size: 137 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

aryan4codes/StockIO
StockIO is a real-time data streaming solution designed to process and analyze stock market data using Apache Kafka and AWS services.
Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

evanmathew/Reddit_ETL_DE
This project demonstrates a complete data pipeline for extracting, transforming, and loading (ETL) Reddit data into an Amazon Redshift data warehouse. The pipeline uses various AWS services and tools including Apache Airflow, PostgreSQL, AWS S3, AWS Glue, AWS Athena, and Amazon Redshift. The project is orchestrated using Docker and Apache Airflow
Language: Python - Size: 137 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

j3-signalroom/apache_flink-kickstarter
Examples of Apache Flink® applications showcasing the DataStream API and Table API in Java and Python, featuring AWS, GitHub, Terraform, and Apache Iceberg.
Language: Java - Size: 17.9 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

1oglop1/aws-glue-monorepo-style
Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.
Language: Python - Size: 488 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 10

imsanjoykb/AWSBootcamp
AWS Bootcamp | Resource | Document | Materials |
Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 2

aws-samples/data-lake-as-code
Data Lake as Code, featuring ChEMBL and OpenTargets
Language: TypeScript - Size: 1.26 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 166 - Forks: 44

AWS-Big-Data-Projects/front-line-concussion-monitoring-system-using-AWS-IoT-and-serverless-data-lakes
A simple, practical, and affordable system for measuring head trauma within the sports environment, subject to the absence of trained medical personnel made using Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda
Language: Shell - Size: 30.3 KB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 0

desininja/Airline-Data-Ingestion-Pipeline
ETL pipeline using AWS services.
Language: Python - Size: 4.33 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

desininja/Quality-Movie-Data-Pipeline
ETL pipeline using AWS services
Language: Python - Size: 727 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-delta-lake
Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3
Language: Python - Size: 314 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 8 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-apache-iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
Language: Python - Size: 465 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 17 - Forks: 2

StatAziz/Ames-Weather-Data-ETL-Pipeline
This project is about building a serverless ETL pipeline using open-meteo weather API.
Language: Jupyter Notebook - Size: 24.2 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Tyriek-cloud/NYC-Mobility-Survey-Analysis
An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.
Language: Python - Size: 2.75 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SeaBebop/TekkenSubreddit-ETL-Pipeline
AWS Glue ETL transformation of tekken subreddit data
Language: Python - Size: 56.6 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

aws-samples/amazon-deequ-glue
Automated data quality suggestions and analysis with Deequ on AWS Glue
Language: Scala - Size: 2.1 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 83 - Forks: 23

jhole89/aws-glue-sbt-quickstart
Example of how to set SBT up for local development of AWS Glue Scripts
Language: Scala - Size: 30.3 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

TimKong21/AWS-Batch-Processing
Big data analysis with AWS services, filtering the Wikiticker dataset with Apache Spark on Amazon EMR, storing data in S3, cataloging with AWS Glue, and querying with Amazon Athena. This end-to-end pipeline exemplifies handling and analyzing big data in the cloud.
Language: Python - Size: 8.01 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

rashmishreev/atm-analytics-bigaata-aws
Analyze over 2.5 million ATM transaction records from Spar Nord Bank to optimize ATM usage patterns and enhance customer service using AWS Services and Big Data Analytics.
Language: Jupyter Notebook - Size: 42.4 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

DenysGonzaga/glue-athena-cdk-example
A small walkthrough how to create an AWS Glue Job Pipeline with AWS CDK
Language: Python - Size: 10.7 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rashmishreev/Skillshift-Trend-Analysis
SkillShift uses advanced DBMS to analyze job listings, providing insights into evolving skill requirements across industries. It offers detailed analysis on skill demand, workplace culture, and industry trends, empowering professionals to make informed decisions about career development in a dynamic job market.
Language: Jupyter Notebook - Size: 33.4 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

pgrarchives/AWS_DATA_PIPELINE
End to End Data Engineering Pipeline using AWS Cloud Services
Language: Jupyter Notebook - Size: 2.03 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

aws-samples/cloud-experiments 📦
Open innovation with 60 minute cloud experiments on AWS
Language: Jupyter Notebook - Size: 22.8 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 88 - Forks: 56

flemm0/capitol-trades
politician stock market activity web scraping project
Language: Python - Size: 2.26 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

DivineSamOfficial/Banking-Data-Warehouse-Pipeline
Banking Data Warehouse Pipeline
Language: Python - Size: 52.1 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

taupirho/read-big-file-aws-athena-glue
Continuing with my case study on reading a big data file, this is the fifth part of my trilogy :-) on how I got on reading a big'ish file with C, Python, spark-python and spark-scala, AWS Elastic Map reduce and AWS Athena.
Language: Python - Size: 45.9 KB - Last synced at: 11 months ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1
