Topic: "aws-glue"
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Language: Python - Size: 16.4 MB - Last synced at: about 16 hours ago - Pushed at: about 18 hours ago - Stars: 4,017 - Forks: 706

dgomesbr/awesome-aws-workshops
(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.
Language: HTML - Size: 1.49 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 411 - Forks: 113

tokern/piicatcher
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
Language: Python - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 306 - Forks: 99

data-dot-all/dataall
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Language: Python - Size: 97.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 242 - Forks: 82

aws-samples/data-lake-as-code
Data Lake as Code, featuring ChEMBL and OpenTargets
Language: TypeScript - Size: 1.26 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 166 - Forks: 44

awslabs/athena-glue-service-logs π¦
Glue scripts for converting AWS Service Logs for use in Athena
Language: Python - Size: 381 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 141 - Forks: 46

aws-samples/cloud-experiments π¦
Open innovation with 60 minute cloud experiments on AWS
Language: Jupyter Notebook - Size: 22.8 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 88 - Forks: 56

aws-samples/amazon-deequ-glue
Automated data quality suggestions and analysis with Deequ on AWS Glue
Language: Scala - Size: 2.1 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 83 - Forks: 23

aws-samples/streamlit-application-deployment-on-aws
Streamlit EDA Dashboard Powered by AWS Cloud
Language: Python - Size: 4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 71 - Forks: 28

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question
β³οΈ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Size: 6.88 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 59 - Forks: 38

aws-samples/bring-your-own-data-labs π¦
Bring your own data Labs: Build a serverless data pipeline based on your own data
Language: HTML - Size: 31.1 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 24

aws-samples/analyzing-reddit-sentiment-with-aws
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
Language: Python - Size: 3.48 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 41 - Forks: 16

aws-samples/aws-glue-jobs-unit-testing
Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects
Language: Python - Size: 402 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 39 - Forks: 22

DisasterAWARE/aws-glue-schema-registry-python
Use the AWS Glue Schema Registry in Python projects.
Language: Python - Size: 61.5 KB - Last synced at: 28 days ago - Pushed at: 6 months ago - Stars: 32 - Forks: 15

cloudposse/terraform-aws-glue
Terraform modules for provisioning and managing AWS Glue resources
Language: HCL - Size: 3.93 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 30 - Forks: 33

awslabs/amazon-athena-cross-account-catalog π¦
π Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
Language: Python - Size: 150 KB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 30 - Forks: 19

1oglop1/aws-glue-monorepo-style
Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.
Language: Python - Size: 488 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 10

SWO-GS/athena-cloudtrail-partitioner π¦
Automate the daily partitioning of your CloudTrail bucket in Athena
Language: JavaScript - Size: 671 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 7

aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Language: Python - Size: 725 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 2

vincentclaes/serverless_data_pipeline_example
Build and Deploy A Serverless Data Pipeline onΒ AWS
Language: Python - Size: 466 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 27 - Forks: 13

aws-samples/monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
Language: Python - Size: 787 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 26 - Forks: 4

tokern/lakecli
A CLI to manage and monitor permissions in AWS Lake Formation
Language: Python - Size: 729 KB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 8

moritzkoerber/covid-19-data-engineering-pipeline
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Language: Python - Size: 1.31 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 5

webysther/aws-glue-docker π¦
π Docker image for AWS Glue Spark/Python
Language: Dockerfile - Size: 56.6 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 8

chgasparoto/terraform-aws-glue
Terraform module which creates Glue resources on AWS
Language: HCL - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 20 - Forks: 16

aws-samples/aws-glue-streaming-etl-with-apache-iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
Language: Python - Size: 465 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 17 - Forks: 2

amzn/rheoceros
Cloud-based AI / ML workflow and data application development framework
Language: Python - Size: 2.49 MB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 17 - Forks: 9

jhole89/aws-glue-sbt-quickstart
Example of how to set SBT up for local development of AWS Glue Scripts
Language: Scala - Size: 30.3 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

aws-samples/aws-glue-crawler-utilities
This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.
Language: Python - Size: 107 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 10

wednesday-solutions/Data-Engineering-Onboarding-Starter
This repository contains a 10 step program to enter the world of Data Engineering
Language: Python - Size: 6.42 MB - Last synced at: 29 days ago - Pushed at: 10 months ago - Stars: 14 - Forks: 1

spe-uob/2020-HealthcareLakeETL
FHIR to OMOP using PySpark on AWS Glue
Language: Python - Size: 1.65 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 4

jonrau1/AWS-ComplianceMachineDontStop
Proof of Value Terraform Scripts to utilize Amazon Web Services (AWS) Security, Identity & Compliance Services to Support your AWS Account Security Posture.
Language: HCL - Size: 95.7 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 13 - Forks: 12

ricardolsmendes/aws-glue-ci-cd-blueprint
Companion repository for the "Streamlining AWS Glue CI/CD β A Comprehensive Blueprint" blog post
Language: HCL - Size: 518 KB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 12 - Forks: 3

mincloud1501/DevOps
DevOpsμ λν κ°λ μ΄ν΄μ AWS κ°λ°μ λꡬλ₯Ό νμ©ν μ€μ΅ λ° μ°κ΅¬
Language: Java - Size: 3.21 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

AWS-Big-Data-Projects/front-line-concussion-monitoring-system-using-AWS-IoT-and-serverless-data-lakes
A simple, practical, and affordable system for measuring head trauma within the sports environment, subject to the absence of trained medical personnel made using Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda
Language: Shell - Size: 30.3 KB - Last synced at: about 20 hours ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 0

TrainingByPackt/Serverless-Architectures-with-AWS
Discover how you can migrate from traditional deployments to serverless architectures with AWS
Language: JavaScript - Size: 8.61 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 11

vincentclaes/glue-devcontainer
Glue VSCode devcontainer setup
Language: Python - Size: 2.97 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 0

ahmadalibagheri/terraform-aws-glue
Create terraform module for AWS Glue
Language: HCL - Size: 2.93 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 1

imsanjoykb/AWSBootcamp
AWS Bootcamp | Resource | Document | Materials |
Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 2

canyousayyes/aws-real-time-data-collection
Demo for building Real Time Data Collection Pipeline on AWS
Language: JavaScript - Size: 2.53 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 9 - Forks: 1

zhiweio/data-engineer-scripts
A curated collection of streamlined and effective scripts and tools designed specifically for data engineering tasks.
Language: Python - Size: 43 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 8 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-delta-lake
Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3
Language: Python - Size: 314 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 8 - Forks: 0

andreichiro/data_engineer_end2end
End-to-end data engineer project
Language: HTML - Size: 20.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 3

miztiik/stream-etl-with-glue
Serverless streaming ETL in with glue job & querying with Athena
Language: Python - Size: 2.93 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 8 - Forks: 6

bdoepf/aws-etl-example
AWS ETL example via AWS DMS & AWS Glue
Language: HCL - Size: 69.3 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 8 - Forks: 2

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question
β³οΈ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Size: 9.44 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 7 - Forks: 7

dashmug/glue-utils
Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs by reducing boilerplate code, increasing type safety, and improving IDE auto-completion.
Language: Python - Size: 697 KB - Last synced at: 24 days ago - Pushed at: 26 days ago - Stars: 6 - Forks: 2

wednesday-solutions/aws-glue-jupyter-notebook-starter
A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally
Language: Jupyter Notebook - Size: 43 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

NeutrinoCorp/streams
:envelope: Streams is a toolkit crafted for data-in-motion ecosystems written in Go.
Language: Go - Size: 760 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 3

geeknam/aws-neptune-aml
Personal take on GraphDB + AML with AWS Neptune + Glue + Lambda.
Language: Python - Size: 87.9 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 1

dforsber/glue-table-cache
Query AWS Glue Tables efficiently with DuckDB
Language: TypeScript - Size: 1.14 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

WinterYukky/cdk-glue-job-builder
A construct library that builds Glue Job Script as if it were Glue Studio.
Language: TypeScript - Size: 329 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

aws-samples/amazon-rds-export-to-s3-automation
This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3
Size: 235 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 2

san99tiago/aws-cdk-athena-s3-workflow
AWS CDK-TypeScript project to showcase an Athena-based solution for S3 data analysis.
Language: TypeScript - Size: 3.85 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

aws-samples/aws-security-hub-glue-aggregator-terraform
These Terraform modules aggregate Security Hub findings to centralized account using Amazon Kinesis Firehose and AWS Glue
Language: HCL - Size: 146 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 3

alicjamazur/data-engineering-case
ETL Redshift-based workflow automated with AWS Step Funtions.
Language: Python - Size: 109 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 6

mlnrt/pexip-logs-in-aws
Pexip Infinity log analysis on the AWS cloud
Size: 1.91 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 1

ev2900/Iceberg_update_metadata_script
Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)
Language: Python - Size: 735 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

somanathkshirsagar/Practical_Data_Science_on_the_AWS-Cloud-Specialization
The Practical Data Science Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker. This Specialization is designed for data-focused develop
Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

marwan116/aws-parquet
a toolkit that provides an object-oriented interface for working with parquet datasets on AWS
Language: Python - Size: 43.9 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

essraahmed/Data-Deduplication-using-AWS-Lake-Formation-FindMatches
Data Deduplication using AWS Lake Formation FindMatches
Language: Jupyter Notebook - Size: 31.3 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

gborn/Serverless-ETL-Pipeline-on-AWS
Design of an ETL Pipeline to process and transform incrementally loaded data in datalake using AWS Lambda, Glue Jobs, EMR, and Athena.
Language: Python - Size: 445 KB - Last synced at: 27 days ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 1

m-radzikowski/aws-creating-athena-tables
Example of different ways to create Amazon Athena tables
Language: JavaScript - Size: 86.9 KB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

ksmin23/aws-glue-etl-pyspark-cheatsheet
Dockerλ₯Ό νμ©ν λ‘컬μμ μ€ν κ°λ₯ν AWS Glue PySpark ETL μμ
Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 3

jhole89/serverless-data-pipelines-demo
Language: HCL - Size: 207 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 1

averemee-si/ora2iceberg
Transfer data from Oracle database tables, views, and query results to Apache Iceberg tables
Language: Java - Size: 392 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

hq969/Youtube-Data-Pipeline-AWS
About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
Language: Python - Size: 1.69 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

goamegah/spark-handson
Spark hands-on
Language: Python - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

monisha-anila/Data-Analyst-hacks
A beginner guide to do your best with data!
Language: Jupyter Notebook - Size: 137 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

ccao-data/model-sales-val
Heuristics for detecting outlier and non-arms-length sales
Language: Python - Size: 3.86 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 1

ritesh-ojha/Data-Engineering
End to End Data Engineering Projects
Language: Python - Size: 32.5 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 2

masood2iq/AWS-Athena-Glue-S3-Bucket-Deployment-Through-AWSConsole
AWS Athena, Glue Database, Glue Crawler and S3 buckets deployment through AWS GUI console.
Size: 3.18 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

aws-samples/aws-trusted-advisor-glue-aggregator-terraform
These Terraform modules aggregate the AWS Trusted Advisor results from different accounts to a centralised account, using AWS Lambda, AWS IAM, Amazon S3 and Amazon SQS
Language: HCL - Size: 155 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

tosh2230/aws-glue-crawlflow
Run AWS Glue Crawler and check the status by AWS Step functions.
Language: Python - Size: 105 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 3

spdns/terraform-aws-shepherd
This module is used to configure AWS resources to work with the Shepherd Protective DNS records.
Language: Python - Size: 1.18 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

AuFeld/AWS_MWAA_With_Step_Functions
Build modern workflows with AWS MWAA, AWS Step Functions, AWS Glue, and AWS EMR
Language: Python - Size: 437 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

ricardo-farias/CovidDataProduct
This repository will be used to understand data science and data engineering concepts
Language: Scala - Size: 641 KB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

ev2900/Glue_Examples
PySpark code samples designed for AWS Glue
Language: Python - Size: 51.8 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

shmokmt/awscrondoc
List up cron expressions registered in Amazon Web Services.
Language: Go - Size: 84 KB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

pizofreude/insightflow-retail-economic-pipeline
A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.
Language: HCL - Size: 672 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

ev2900/MongoDB_Streams_Glue_Iceberg
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
Language: Python - Size: 27.3 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 0

vitalibo/glue-pyspark-skeleton
AWS Glue PySpark project skeleton
Language: Python - Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

kafbat/ui-serde-glue
AWS Glue Serde for kafka-ui
Language: Java - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 2

siconge/Tencent-HQ-BIM-Data-Pipeline-with-AWS
This project delivers an end-to-end data pipeline solution designed to employ a comprehensive ETL process to move BIM data from Autodesk Revit model of Tencent Global Headquarters into cloud storage for processing and and analytics. The pipeline leverages tools and services such as Apache Airflow, Amazon S3, AWS Glue, and Amazon Redshift.
Language: Python - Size: 10.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Kirolos00Daniel/AWS-Store-Orders-Analysis
AWS Orders Analysis
Size: 1.88 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

StatAziz/Ames-Weather-Data-ETL-Pipeline
This project is about building a serverless ETL pipeline using open-meteo weather API.
Language: Jupyter Notebook - Size: 24.2 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

evanmathew/Reddit_ETL_DE
This project demonstrates a complete data pipeline for extracting, transforming, and loading (ETL) Reddit data into an Amazon Redshift data warehouse. The pipeline uses various AWS services and tools including Apache Airflow, PostgreSQL, AWS S3, AWS Glue, AWS Athena, and Amazon Redshift. The project is orchestrated using Docker and Apache Airflow
Language: Python - Size: 137 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

shubhamjais40/AWS-Data-Pipeline-Project-Implementing-Data-Validation-Using-Lambda-based-Gluecrawler-v1.0
This Project demonstrates the Technology shift in Automobile Firm to resolve the data engineering challenge of manual data ops. AWS Cloud Services implemented here as: S3 bucket for lake storage incoming batches, Lambda Python Script for automating the validation function call and Glue Crawler to generate relational table with successful testing.
Language: Python - Size: 347 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

DimaKuriptya/RedditETL
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dashmug/glue-devtools
Glue Development Tools
Language: Python - Size: 241 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

danrbueno/airflow_aws_justwatch_pipeline
Data pipeline using Airflow, GraphQL, AWS S3, AWS Glue Jobs and AWS Redshift
Language: Python - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

ilichDataEngineer/DataEngineerIO-CapstoneProject-DE-BTC2024
Based on Zack Wilson's Data Engineering Bootcamp
Language: Python - Size: 107 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

harinik05/cleanflo-infra
Project that incorporates TerraForm to create AWS infrastructure using S3, Lambda, and DynamoDB tables for ocean and river data π’
Language: HCL - Size: 90.8 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Dorianteffo/vg-sales-glue-spark-terraform
ETL job with AWS Glue
Language: Python - Size: 872 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kyopark2014/aws-analytics
It shows what is glue and how to use it.
Size: 55.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

RuFerdZ/Medical-X
US Insurance cost predicting linear regression model. Mainly used to learn about Machine Learning tools in Amazon Web Services (AWS)
Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

gavin-crowley/PySpark-AWS-Glue
PySpark For AWS Glue Demo
Language: Jupyter Notebook - Size: 741 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

AdityaMehra23/youtube-data-pipeline-aws
The project aims to utilize YouTube video stats (likes, views, comments) for in-depth insights into the target audience's behavior and preferences.
Language: Python - Size: 177 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kingyiusuen/udacity-data-engineering-nanodegree
Projects for Udacity's Data Engineering Nanodegree
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

ali-habibzadeh/serverless-crawler
A serverless crawler with Lambda, Dynamodb and Kinesis Firehose
Language: TypeScript - Size: 1.77 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0
