An open API service providing repository metadata for many open source software ecosystems.

Topic: "aws-glue"

aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Language: Python - Size: 16.4 MB - Last synced at: about 16 hours ago - Pushed at: about 18 hours ago - Stars: 4,017 - Forks: 706

dgomesbr/awesome-aws-workshops

(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.

Language: HTML - Size: 1.49 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 411 - Forks: 113

tokern/piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

Language: Python - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 306 - Forks: 99

data-dot-all/dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

Language: Python - Size: 97.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 242 - Forks: 82

aws-samples/data-lake-as-code

Data Lake as Code, featuring ChEMBL and OpenTargets

Language: TypeScript - Size: 1.26 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 166 - Forks: 44

awslabs/athena-glue-service-logs πŸ“¦

Glue scripts for converting AWS Service Logs for use in Athena

Language: Python - Size: 381 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 141 - Forks: 46

aws-samples/cloud-experiments πŸ“¦

Open innovation with 60 minute cloud experiments on AWS

Language: Jupyter Notebook - Size: 22.8 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 88 - Forks: 56

aws-samples/amazon-deequ-glue

Automated data quality suggestions and analysis with Deequ on AWS Glue

Language: Scala - Size: 2.1 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 83 - Forks: 23

aws-samples/streamlit-application-deployment-on-aws

Streamlit EDA Dashboard Powered by AWS Cloud

Language: Python - Size: 4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 71 - Forks: 28

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 6.88 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 59 - Forks: 38

aws-samples/bring-your-own-data-labs πŸ“¦

Bring your own data Labs: Build a serverless data pipeline based on your own data

Language: HTML - Size: 31.1 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 24

aws-samples/analyzing-reddit-sentiment-with-aws

Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.

Language: Python - Size: 3.48 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 41 - Forks: 16

aws-samples/aws-glue-jobs-unit-testing

Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects

Language: Python - Size: 402 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 39 - Forks: 22

DisasterAWARE/aws-glue-schema-registry-python

Use the AWS Glue Schema Registry in Python projects.

Language: Python - Size: 61.5 KB - Last synced at: 28 days ago - Pushed at: 6 months ago - Stars: 32 - Forks: 15

cloudposse/terraform-aws-glue

Terraform modules for provisioning and managing AWS Glue resources

Language: HCL - Size: 3.93 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 30 - Forks: 33

awslabs/amazon-athena-cross-account-catalog πŸ“¦

πŸŒ‰ Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena

Language: Python - Size: 150 KB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 30 - Forks: 19

1oglop1/aws-glue-monorepo-style

Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.

Language: Python - Size: 488 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 10

SWO-GS/athena-cloudtrail-partitioner πŸ“¦

Automate the daily partitioning of your CloudTrail bucket in Athena

Language: JavaScript - Size: 671 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 7

aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

Language: Python - Size: 725 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 2

vincentclaes/serverless_data_pipeline_example

Build and Deploy A Serverless Data Pipeline onΒ AWS

Language: Python - Size: 466 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 27 - Forks: 13

aws-samples/monitoring-apache-iceberg-table-metadata-layer

Sample code to collect Apache Iceberg metrics for table monitoring

Language: Python - Size: 787 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 26 - Forks: 4

tokern/lakecli

A CLI to manage and monitor permissions in AWS Lake Formation

Language: Python - Size: 729 KB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 8

moritzkoerber/covid-19-data-engineering-pipeline

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

Language: Python - Size: 1.31 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 5

webysther/aws-glue-docker πŸ“¦

πŸ‹ Docker image for AWS Glue Spark/Python

Language: Dockerfile - Size: 56.6 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 8

chgasparoto/terraform-aws-glue

Terraform module which creates Glue resources on AWS

Language: HCL - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 20 - Forks: 16

aws-samples/aws-glue-streaming-etl-with-apache-iceberg

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 465 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 17 - Forks: 2

amzn/rheoceros

Cloud-based AI / ML workflow and data application development framework

Language: Python - Size: 2.49 MB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 17 - Forks: 9

jhole89/aws-glue-sbt-quickstart

Example of how to set SBT up for local development of AWS Glue Scripts

Language: Scala - Size: 30.3 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

aws-samples/aws-glue-crawler-utilities

This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.

Language: Python - Size: 107 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 10

wednesday-solutions/Data-Engineering-Onboarding-Starter

This repository contains a 10 step program to enter the world of Data Engineering

Language: Python - Size: 6.42 MB - Last synced at: 29 days ago - Pushed at: 10 months ago - Stars: 14 - Forks: 1

spe-uob/2020-HealthcareLakeETL

FHIR to OMOP using PySpark on AWS Glue

Language: Python - Size: 1.65 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 4

jonrau1/AWS-ComplianceMachineDontStop

Proof of Value Terraform Scripts to utilize Amazon Web Services (AWS) Security, Identity & Compliance Services to Support your AWS Account Security Posture.

Language: HCL - Size: 95.7 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 13 - Forks: 12

ricardolsmendes/aws-glue-ci-cd-blueprint

Companion repository for the "Streamlining AWS Glue CI/CD β€” A Comprehensive Blueprint" blog post

Language: HCL - Size: 518 KB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 12 - Forks: 3

mincloud1501/DevOps

DevOps에 λŒ€ν•œ κ°œλ… 이해와 AWS 개발자 도ꡬλ₯Ό ν™œμš©ν•œ μ‹€μŠ΅ 및 연ꡬ

Language: Java - Size: 3.21 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

AWS-Big-Data-Projects/front-line-concussion-monitoring-system-using-AWS-IoT-and-serverless-data-lakes

A simple, practical, and affordable system for measuring head trauma within the sports environment, subject to the absence of trained medical personnel made using Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda

Language: Shell - Size: 30.3 KB - Last synced at: about 20 hours ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 0

TrainingByPackt/Serverless-Architectures-with-AWS

Discover how you can migrate from traditional deployments to serverless architectures with AWS

Language: JavaScript - Size: 8.61 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 11

vincentclaes/glue-devcontainer

Glue VSCode devcontainer setup

Language: Python - Size: 2.97 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 0

ahmadalibagheri/terraform-aws-glue

Create terraform module for AWS Glue

Language: HCL - Size: 2.93 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 1

imsanjoykb/AWSBootcamp

AWS Bootcamp | Resource | Document | Materials |

Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 2

canyousayyes/aws-real-time-data-collection

Demo for building Real Time Data Collection Pipeline on AWS

Language: JavaScript - Size: 2.53 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 9 - Forks: 1

zhiweio/data-engineer-scripts

A curated collection of streamlined and effective scripts and tools designed specifically for data engineering tasks.

Language: Python - Size: 43 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 8 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-delta-lake

Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 314 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 8 - Forks: 0

andreichiro/data_engineer_end2end

End-to-end data engineer project

Language: HTML - Size: 20.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 3

miztiik/stream-etl-with-glue

Serverless streaming ETL in with glue job & querying with Athena

Language: Python - Size: 2.93 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 8 - Forks: 6

bdoepf/aws-etl-example

AWS ETL example via AWS DMS & AWS Glue

Language: HCL - Size: 69.3 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 8 - Forks: 2

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

Size: 9.44 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 7 - Forks: 7

dashmug/glue-utils

Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs by reducing boilerplate code, increasing type safety, and improving IDE auto-completion.

Language: Python - Size: 697 KB - Last synced at: 24 days ago - Pushed at: 26 days ago - Stars: 6 - Forks: 2

wednesday-solutions/aws-glue-jupyter-notebook-starter

A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally

Language: Jupyter Notebook - Size: 43 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

NeutrinoCorp/streams

:envelope: Streams is a toolkit crafted for data-in-motion ecosystems written in Go.

Language: Go - Size: 760 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 3

geeknam/aws-neptune-aml

Personal take on GraphDB + AML with AWS Neptune + Glue + Lambda.

Language: Python - Size: 87.9 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 1

dforsber/glue-table-cache

Query AWS Glue Tables efficiently with DuckDB

Language: TypeScript - Size: 1.14 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

WinterYukky/cdk-glue-job-builder

A construct library that builds Glue Job Script as if it were Glue Studio.

Language: TypeScript - Size: 329 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

aws-samples/amazon-rds-export-to-s3-automation

This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3

Size: 235 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 2

san99tiago/aws-cdk-athena-s3-workflow

AWS CDK-TypeScript project to showcase an Athena-based solution for S3 data analysis.

Language: TypeScript - Size: 3.85 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

aws-samples/aws-security-hub-glue-aggregator-terraform

These Terraform modules aggregate Security Hub findings to centralized account using Amazon Kinesis Firehose and AWS Glue

Language: HCL - Size: 146 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 3

alicjamazur/data-engineering-case

ETL Redshift-based workflow automated with AWS Step Funtions.

Language: Python - Size: 109 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 6

mlnrt/pexip-logs-in-aws

Pexip Infinity log analysis on the AWS cloud

Size: 1.91 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 1

ev2900/Iceberg_update_metadata_script

Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)

Language: Python - Size: 735 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

somanathkshirsagar/Practical_Data_Science_on_the_AWS-Cloud-Specialization

The Practical Data Science Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker. This Specialization is designed for data-focused develop

Language: Jupyter Notebook - Size: 11.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

marwan116/aws-parquet

a toolkit that provides an object-oriented interface for working with parquet datasets on AWS

Language: Python - Size: 43.9 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

essraahmed/Data-Deduplication-using-AWS-Lake-Formation-FindMatches

Data Deduplication using AWS Lake Formation FindMatches

Language: Jupyter Notebook - Size: 31.3 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

gborn/Serverless-ETL-Pipeline-on-AWS

Design of an ETL Pipeline to process and transform incrementally loaded data in datalake using AWS Lambda, Glue Jobs, EMR, and Athena.

Language: Python - Size: 445 KB - Last synced at: 27 days ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 1

m-radzikowski/aws-creating-athena-tables

Example of different ways to create Amazon Athena tables

Language: JavaScript - Size: 86.9 KB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

ksmin23/aws-glue-etl-pyspark-cheatsheet

Dockerλ₯Ό ν™œμš©ν•œ λ‘œμ»¬μ—μ„œ μ‹€ν–‰ κ°€λŠ₯ν•œ AWS Glue PySpark ETL 예제

Language: Jupyter Notebook - Size: 24.4 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 3

jhole89/serverless-data-pipelines-demo

Language: HCL - Size: 207 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 1

averemee-si/ora2iceberg

Transfer data from Oracle database tables, views, and query results to Apache Iceberg tables

Language: Java - Size: 392 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

hq969/Youtube-Data-Pipeline-AWS

About Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

Language: Python - Size: 1.69 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

goamegah/spark-handson

Spark hands-on

Language: Python - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

monisha-anila/Data-Analyst-hacks

A beginner guide to do your best with data!

Language: Jupyter Notebook - Size: 137 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

ccao-data/model-sales-val

Heuristics for detecting outlier and non-arms-length sales

Language: Python - Size: 3.86 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 1

ritesh-ojha/Data-Engineering

End to End Data Engineering Projects

Language: Python - Size: 32.5 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 2

masood2iq/AWS-Athena-Glue-S3-Bucket-Deployment-Through-AWSConsole

AWS Athena, Glue Database, Glue Crawler and S3 buckets deployment through AWS GUI console.

Size: 3.18 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

aws-samples/aws-trusted-advisor-glue-aggregator-terraform

These Terraform modules aggregate the AWS Trusted Advisor results from different accounts to a centralised account, using AWS Lambda, AWS IAM, Amazon S3 and Amazon SQS

Language: HCL - Size: 155 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

tosh2230/aws-glue-crawlflow

Run AWS Glue Crawler and check the status by AWS Step functions.

Language: Python - Size: 105 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 3

spdns/terraform-aws-shepherd

This module is used to configure AWS resources to work with the Shepherd Protective DNS records.

Language: Python - Size: 1.18 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

AuFeld/AWS_MWAA_With_Step_Functions

Build modern workflows with AWS MWAA, AWS Step Functions, AWS Glue, and AWS EMR

Language: Python - Size: 437 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

ricardo-farias/CovidDataProduct

This repository will be used to understand data science and data engineering concepts

Language: Scala - Size: 641 KB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

ev2900/Glue_Examples

PySpark code samples designed for AWS Glue

Language: Python - Size: 51.8 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

shmokmt/awscrondoc

List up cron expressions registered in Amazon Web Services.

Language: Go - Size: 84 KB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

pizofreude/insightflow-retail-economic-pipeline

A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.

Language: HCL - Size: 672 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

ev2900/MongoDB_Streams_Glue_Iceberg

Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date

Language: Python - Size: 27.3 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 0

vitalibo/glue-pyspark-skeleton

AWS Glue PySpark project skeleton

Language: Python - Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

kafbat/ui-serde-glue

AWS Glue Serde for kafka-ui

Language: Java - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 2

siconge/Tencent-HQ-BIM-Data-Pipeline-with-AWS

This project delivers an end-to-end data pipeline solution designed to employ a comprehensive ETL process to move BIM data from Autodesk Revit model of Tencent Global Headquarters into cloud storage for processing and and analytics. The pipeline leverages tools and services such as Apache Airflow, Amazon S3, AWS Glue, and Amazon Redshift.

Language: Python - Size: 10.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Kirolos00Daniel/AWS-Store-Orders-Analysis

AWS Orders Analysis

Size: 1.88 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

StatAziz/Ames-Weather-Data-ETL-Pipeline

This project is about building a serverless ETL pipeline using open-meteo weather API.

Language: Jupyter Notebook - Size: 24.2 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

evanmathew/Reddit_ETL_DE

This project demonstrates a complete data pipeline for extracting, transforming, and loading (ETL) Reddit data into an Amazon Redshift data warehouse. The pipeline uses various AWS services and tools including Apache Airflow, PostgreSQL, AWS S3, AWS Glue, AWS Athena, and Amazon Redshift. The project is orchestrated using Docker and Apache Airflow

Language: Python - Size: 137 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

shubhamjais40/AWS-Data-Pipeline-Project-Implementing-Data-Validation-Using-Lambda-based-Gluecrawler-v1.0

This Project demonstrates the Technology shift in Automobile Firm to resolve the data engineering challenge of manual data ops. AWS Cloud Services implemented here as: S3 bucket for lake storage incoming batches, Lambda Python Script for automating the validation function call and Glue Crawler to generate relational table with successful testing.

Language: Python - Size: 347 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

DimaKuriptya/RedditETL

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dashmug/glue-devtools

Glue Development Tools

Language: Python - Size: 241 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

danrbueno/airflow_aws_justwatch_pipeline

Data pipeline using Airflow, GraphQL, AWS S3, AWS Glue Jobs and AWS Redshift

Language: Python - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

ilichDataEngineer/DataEngineerIO-CapstoneProject-DE-BTC2024

Based on Zack Wilson's Data Engineering Bootcamp

Language: Python - Size: 107 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

harinik05/cleanflo-infra

Project that incorporates TerraForm to create AWS infrastructure using S3, Lambda, and DynamoDB tables for ocean and river data 🐒

Language: HCL - Size: 90.8 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Dorianteffo/vg-sales-glue-spark-terraform

ETL job with AWS Glue

Language: Python - Size: 872 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kyopark2014/aws-analytics

It shows what is glue and how to use it.

Size: 55.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

RuFerdZ/Medical-X

US Insurance cost predicting linear regression model. Mainly used to learn about Machine Learning tools in Amazon Web Services (AWS)

Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

gavin-crowley/PySpark-AWS-Glue

PySpark For AWS Glue Demo

Language: Jupyter Notebook - Size: 741 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

AdityaMehra23/youtube-data-pipeline-aws

The project aims to utilize YouTube video stats (likes, views, comments) for in-depth insights into the target audience's behavior and preferences.

Language: Python - Size: 177 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kingyiusuen/udacity-data-engineering-nanodegree

Projects for Udacity's Data Engineering Nanodegree

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

ali-habibzadeh/serverless-crawler

A serverless crawler with Lambda, Dynamodb and Kinesis Firehose

Language: TypeScript - Size: 1.77 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0