An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: aws-glue-crawler

harika-majji/aws-stock-market-analysis

Language: Jupyter Notebook - Size: 2.38 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

jibbs1703/Tickit-Data-Pipeline

This repository demonstrates the creation of a robust data pipeline using an Orchestrator, on-prem and cloud resources. It collects data from on-premises SQL and NoSQL database and loads it into a SQL database in the cloud.

Language: Python - Size: 5.86 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ShubhamMohanty680/Spotify_end_to_end_data_engineering

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.

Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

shahidmalik4/aws-glue-stepfunctions-etl

This project automates an ETL pipeline using AWS Glue, S3, Athena, and Step Functions to transform raw Airbnb data. It cleanses, enriches, and organizes the data into separate raw and transformed databases, enabling efficient querying and analysis via Athena, with automated notifications through SNS.

Language: Python - Size: 3.47 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

VvEK-Hiremath/Airlines-Data-Pipeline-Project-AWS

Implementing data pipeline using AWS services for airlines data

Language: Python - Size: 195 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

SadafAsad/LinkedIn-Jobs-Analysis

Unveiling job market trends with Scrapy and AWS

Language: Python - Size: 562 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

desininja/Quality-Movie-Data-Pipeline

ETL pipeline using AWS services

Language: Python - Size: 727 KB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Saurabhkhandebharad/BigData-SK

Analyzed a multicategory e-commerce store using big data techniques on a Kaggle dataset with the help of AWS EC2, AWS S3, PySpark, AWS Glue ETL, AWS Athena, AWS CloudFormation, AWS Lambda and Power BI!

Language: Python - Size: 8.79 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Tyriek-cloud/NYC-Mobility-Survey-Analysis

An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.

Language: Python - Size: 2.75 MB - Last synced at: 15 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

TravelXML/KAFKA-PYTHON-AWS-CRAWLER-AMAZON-ATHENA

A comprehensive tutorials / steps / scripts for setting up Apache Kafka on an Amazon EC2 instance, streaming logs to S3, and querying data with AWS Glue and Amazon Athena. Includes Zookeeper configuration, producer and consumer setup, and automated data catalog creation

Language: Jupyter Notebook - Size: 2.46 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

sumanthmalipeddi/spotify_trending_telugu

Collecting the list of songs,album and artists list details from the Spotify Music Application in specific intervals using spotipy API and performing ETL Operations using Amazon Cloud Services

Language: Jupyter Notebook - Size: 630 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

h-fuzzy-logic/data-analytics-spring

Open data and cloud computing to answer the question: Are we losing our spring days?

Language: Jupyter Notebook - Size: 390 KB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mihirkudale/Stock-Market-Real-Time-Data-Engineering-Project

In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.

Language: Jupyter Notebook - Size: 2.46 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

fermat01/ETL-Data-Pipeline-using-AWS-EMR-Spark-Glue-Athena

Etl data pipeline using aws services

Language: Python - Size: 4.07 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

sarah-zhan/data_pipeline_amazon_products

An end-to-end data pipeline built with AWS S3, Glue, Crawler, Athena, Tableau visulization

Language: Jupyter Notebook - Size: 1.74 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Kartik-Banga/Automated-ETL-Pipeline-for-Playstore-Data

Implemented ETL pipeline on AWS for Playstore data using Lambda, Glue Crawlers, and Glue ETL Jobs. Orchestrated workflow with Step Functions and achieved seamless integration, optimal data merging, and enhanced data quality/accessibility.

Size: 2.97 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

imverma/DataEngineering-YouTube-Analysis-Project

An end-to-end solution for managing and analyzing YouTube video data from Kaggle, leveraging AWS services and visualized through Quicksight and Tableau

Language: Python - Size: 61.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

GabrielDan92/AWS_Terraform_PySpark-ETL_Job

Terraform configuration that creates several AWS services, uploads data in S3 and starts the Glue Crawler and Glue Job.

Language: Python - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

dhvani-k/YouTrend_Insights_Analyzing_YouTube_Video_Landscape

An end-to-end solution for managing and analyzing YouTube video data from Kaggle, leveraging AWS services and visualized through Quicksight and Tableau

Language: Python - Size: 59.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AirtonLira/aws-bigdata-glue-athena

Este projeto tem como objetivo realizar a coleta, catalogo, governança, processamento e visualização de dados.

Size: 3.76 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

thedatanerdz/DEP-7

AWS Covid data engineering project

Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rahulrajan15/Stock_Market_Kafka

Real-Time Stock Market Data Science Project using Apache Kafka: Analyzing and predicting stock market trends in real-time for informed decision-making. Scalable and low-latency data processing.

Language: Jupyter Notebook - Size: 2.48 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

aws-samples/amazon-rds-export-to-s3-automation

This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3

Size: 235 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 2

aws-samples/aws-glue-crawler-utilities

This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.

Language: Python - Size: 107 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 10

masood2iq/AWS-Athena-Glue-S3-Bucket-Deployment-Through-AWSConsole

AWS Athena, Glue Database, Glue Crawler and S3 buckets deployment through AWS GUI console.

Size: 3.18 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

productiveAnalytics/aws-cdk-constructs-sandbox

Cloud Development Kit (AWS CDK) using TypeScript, Python and Java

Language: Java - Size: 5.49 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0